Planet PDF Forum Planet PDF Forum
  New Posts New Posts RSS Feed - PDF Structure Extraction
  FAQ FAQ  Forum Search   Register Register  Login Login


Hi, welcome to the Foxit Planet PDF Forum. If you have PDF or Adobe Acrobat questions then the right place to ask them is here, in this forum.

PDF Structure Extraction

 Post Reply Post Reply
Author
kpax View Drop Down
New Member
New Member
Avatar

Joined: 12 Oct 2016
Location: south africa
Points: 1
Post Options Post Options   Quote kpax Quote  Post ReplyReply Direct Link To This Post Topic: PDF Structure Extraction
    Posted: 13 Oct 2016 at 1:27pm
Hi
 
Please forgive me but I'm completely new at this so have no idea where to start.
I need extract the structure of a pdf file in an xml format. ie on a sales advert, all the prices, and their x and y co-ordinates.
Is there any software that can do this, and is there any API that can possibly achieve this?
 
Regards
Back to Top
Sponsored Links


Back to Top
try67 View Drop Down
Senior Member
Senior Member


Joined: 15 Jul 2011
Points: 617
Post Options Post Options   Quote try67 Quote  Post ReplyReply Direct Link To This Post Posted: 13 Oct 2016 at 8:15pm
If the file is very consistently structured then it might be possible to do it using a script in Acrobat or a stand-alone application. It's almost impossible to say for sure whether or not it will be possible without seeing some sample files.

If you want you can send me such files by email and I'll let you know if I think it's doable or not, and if so, how much it will cost to develop such a tool.
- Acrobat Expert - Contact me personally at try6767@gmail.com
Check out my custom-made scripts website: http://try67.blogspot.com
Back to Top
andruo11 View Drop Down
New Member
New Member


Joined: 23 Oct 2016
Location: San Francisco
Points: 1
Post Options Post Options   Quote andruo11 Quote  Post ReplyReply Direct Link To This Post Posted: 23 Oct 2016 at 12:50am
You might try the TET Acrobat plug-in from http://www.pdflib.com. It's free and it extracts the coordinates of all the PDF's text in XML format. If that works for you there's more advanced programming library you can download.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.10
Copyright ©2001-2017 Web Wiz Ltd.

This page was generated in 0.000 seconds.