Planet PDF Forum Planet PDF Forum
  New Posts New Posts RSS Feed - Batch search PDF text, return page numbers
  FAQ FAQ  Forum Search   Register Register  Login Login

Hi, welcome to the Foxit Planet PDF Forum. If you have PDF or Adobe Acrobat questions then the right place to ask them is here, in this forum.

Batch search PDF text, return page numbers

 Post Reply Post Reply
Author
dianaascher View Drop Down
New Member
New Member


Joined: 11 Aug 2013
Location: Los Angeles
Points: 2
Post Options Post Options   Quote dianaascher Quote  Post ReplyReply Direct Link To This Post Topic: Batch search PDF text, return page numbers
    Posted: 11 Aug 2013 at 9:36pm
Hi there,

I'm trying to use a list of terms (currently residing in an Excel column) to generate a list of page numbers on which each term appears in a large PDF or collection of PDFs. In other words, I want to search a PDF for term A and in the next cell return a series of page numbers where term A appears. Then I want to move on to term B, etc.

I created an index of the PDF, which now exists as a PDX file. When I export the PDX file to CSV, I get a breakdown of each instance within the PDF where term A appears. But I would have to execute this task manually 500 times if I want to search for each term in the list and then I have to combine the page numbers into a series for each term.

How can I automate this?

Many thanks,
Diana

Back to Top
gkaiseril View Drop Down
Senior Member
Senior Member


Joined: 15 Jul 2011
Location: USA
Points: 4116
Post Options Post Options   Quote gkaiseril Quote  Post ReplyReply Direct Link To This Post Posted: 11 Aug 2013 at 11:00pm
If you have brought the csv of the pdx file into Excel then you would have an an Excel programing problem.
 
It is possible using JavaScript within Acrobat to search for words within a PDF and accumulate the page numbers and words in an array. Once in an array that array could be sorted by word and additional coding could arrange the all the page numbers after a given word.
 
Back to Top
dianaascher View Drop Down
New Member
New Member


Joined: 11 Aug 2013
Location: Los Angeles
Points: 2
Post Options Post Options   Quote dianaascher Quote  Post ReplyReply Direct Link To This Post Posted: 12 Aug 2013 at 2:01am
Hi gkaiseril,

Thanks for your prompt response. You're right that pulling the CSV into Excel doesn't help other than to demonstrate what type of data is generated by the Acrobat term search.

What is key to my task is that I have a list of, say, 500 terms that I want to search within a set of PDFs and have whatever code employed return the array of page numbers for each of the terms in the list. Ideally, I would have an Excel worksheet with Column A as the list of terms and Column B as the page array for the term in Column A.

What I don't know how to do is:

1. pull the search terms from Column A in Excel into the Adobe search function
2. return the search results as an array
3. deposit the array in Column B in Excel
4. initiate the process recursively with the next term in Column A

I'm afraid I've only created some simple macros in Excel, though I understand coding conceptually.

Any help you can offer would be appreciated.


Back to Top
aandi View Drop Down
Senior Member
Senior Member


Joined: 07 Jul 2011
Points: 18358
Post Options Post Options   Quote aandi Quote  Post ReplyReply Direct Link To This Post Posted: 12 Aug 2013 at 11:02am
There is no API (programming interface) to the SEARCH feature of Acrobat. No way to automate using this.

You could write a JavaScript which extracted each word in turn from each page, looking for strings. Hard work though, and much slower than the search.
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.10
Copyright ©2001-2017 Web Wiz Ltd.

This page was generated in 0.031 seconds.