Planet PDF Forum Planet PDF Forum
  New Posts New Posts RSS Feed - Identify encrypted or graphics-only pdfs in a list
  FAQ FAQ  Forum Search   Register Register  Login Login

Hi, welcome to the Foxit Planet PDF Forum. If you have PDF or Adobe Acrobat questions then the right place to ask them is here, in this forum.

Identify encrypted or graphics-only pdfs in a list

 Post Reply Post Reply
Author
sdspieg View Drop Down
New Member
New Member


Joined: 27 Feb 2012
Points: 3
Post Options Post Options   Quote sdspieg Quote  Post ReplyReply Direct Link To This Post Topic: Identify encrypted or graphics-only pdfs in a list
    Posted: 27 Feb 2012 at 3:47pm
Is there any way to analyze a set of pdf-files and identify which ones are either encrypted or graphics-only? I have looked pretty much everywhere, but I haven't been able to find anything like this (only tools like peepdf or pdfid to analyze whether a pdf contains harmful scripts, or like). So now I'm thinking I may have overlooked something... Any help would be greatly appreciated.

-Stephan
Back to Top
try67 View Drop Down
Senior Member
Senior Member


Joined: 15 Jul 2011
Points: 634
Post Options Post Options   Quote try67 Quote  Post ReplyReply Direct Link To This Post Posted: 28 Feb 2012 at 11:03am
I might be able to develop for you a stand-alone app that can do this.
To discuss it further, you can contact me personally at try6767 at gmail dot com.
- Acrobat Expert - Contact me personally at try6767@gmail.com
Check out my custom-made scripts website: http://try67.blogspot.com
Back to Top
aandi View Drop Down
Senior Member
Senior Member


Joined: 07 Jul 2011
Points: 18358
Post Options Post Options   Quote aandi Quote  Post ReplyReply Direct Link To This Post Posted: 28 Feb 2012 at 2:08pm
You may have overlooked that if you used a text extraction tool on each file in turn, those which gave errors or no text might be the encrypted ones. (Some text extraction tools will work with encrypted files that have no open password, however).
Back to Top
sdspieg View Drop Down
New Member
New Member


Joined: 27 Feb 2012
Points: 3
Post Options Post Options   Quote sdspieg Quote  Post ReplyReply Direct Link To This Post Posted: 28 Feb 2012 at 3:58pm
Well, the problem is that we have thousands of files that we upload to a server to be textmined, and that we only THEN find out what the problem is. So it would be much nicer to be able to identify these pdfs ahead of time...
Back to Top
sdspieg View Drop Down
New Member
New Member


Joined: 27 Feb 2012
Points: 3
Post Options Post Options   Quote sdspieg Quote  Post ReplyReply Direct Link To This Post Posted: 12 Mar 2012 at 12:46pm
For those who are interested in a free solution: I followed aandy's cue and used 'Some pdf to text converter' - a freeware pdf-to-text converter. In essence, there can be (at least) 2 problems with pdf-files: 
  • they can be corrupted pdf files: in this case, the program crashes and you can then check in the File Explorer which file made it crash (as it will be the first one in the list that hasn't been converted yet) and see if you can fix it somehow (or re-download, for instance); and
  • encrypted or graphical-only pdf files: here, if you sort the files by size, you will immediately see the files that didn't get extracted and you can then 'fix' those (e.g. with APDFPR or by running an OCR program like Adobe Pro, or Nitro or Abbyy Finereader. 
Hope this helps...

-Stephan


Edited by sdspieg - 12 Mar 2012 at 12:46pm
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.10
Copyright ©2001-2017 Web Wiz Ltd.

This page was generated in 0.047 seconds.