Print Page | Close Window

How to get internal structure of the PDF file?

Printed From: Foxit's Planet PDF Forum
Category: Lets Talk PDF
Forum Name: General
Forum Description: This is for experienced PDF users. Here you can talk about any topics not set aside for Developers, Prepress and PDF Forms.
Printed Date: 17 Feb 2020 at 5:08pm
Software Version: Web Wiz Forums 11.10 -

Topic: How to get internal structure of the PDF file?
Posted By: Saravanan6
Subject: How to get internal structure of the PDF file?
Date Posted: 10 Jan 2012 at 5:42am
Hi All,

    I would like to know if there is any tool available for getting internal structure(XML BASED) of the PDF file likewise Open XML representation for MS-OFFICE 2007?

Please enlighten me on this...?

Thanks & Regards,

Posted By: aandi
Date Posted: 10 Jan 2012 at 8:50am
What sort of structure? The object tree? Tagging (optional)? A pseudo-structure of visual elements only?
If you aren't sure of the answer, try this instead: what is the purpose of the excercise?

Posted By: Saravanan6
Date Posted: 10 Jan 2012 at 11:43am


     Thanks for your reply.

I am expecting Likewise in Open XML representation for MS-OFFICE 2007.Because i want to parse(GET OR EXTRACT) each and every paragraphs,images,table,graph of the PDF document, then finally i want to assign some paragraphs to split the original document and able to build the new document that contains only the assigned paragraphs without changing any format.

Please enlighten me on this...

Thanks & Regards,

Posted By: aandi
Date Posted: 10 Jan 2012 at 1:22pm
You may be disappointed in what is inside a PDF. There are images, there is text, and there is vector art, drawn a line at a time. There are no tables, no paragraphs, no charts: only collections of text, images and lines which look like them.
Unless the PDF file is tagged, that is. Are you dealing with tagged files?

Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.10 -
Copyright ©2001-2017 Web Wiz Ltd. -