Print Page | Close Window

Detect Vector Images

Printed From: Foxit's Planet PDF Forum
Category: Lets Talk PDF
Forum Name: PDF Developers
Forum Description: This conference is for PDF developers.
URL: http://forums.planetpdf.com/forum_posts.asp?TID=4055
Printed Date: 19 Oct 2019 at 6:10am
Software Version: Web Wiz Forums 11.10 - http://www.webwizforums.com


Topic: Detect Vector Images
Posted By: dawerj
Subject: Detect Vector Images
Date Posted: 28 Jan 2019 at 2:25pm
Hey Folks,

For a specific usecase I need to detect whether a pdf contains only vector images, text, raster images, or some combination of the three.

I'm developing in python. I've gone through the kingPDF tutorial and I've written something that can so far read the raw stream, parse out all the objects, classify them into types, create a parent/child tree as well as dependency trees. It can also inflate the things that are compressed with flate. Decoding of the text using the character sets from font files in the doc is not quite there yet, only handles things that are utf-8 right now.

But I feel there must be an easier path. Are there any simpler ways to determine the class of an object or am I stuck with having to decode them all to figure out that information?

Cheers,
Dawer



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.10 - http://www.webwizforums.com
Copyright ©2001-2017 Web Wiz Ltd. - https://www.webwiz.net