Print Page | Close Window

cross reference table and byte offset

Printed From: Foxit's Planet PDF Forum
Category: Lets Talk PDF
Forum Name: PDF Standards
Forum Description: Discuss PDF/X, PDF/A, PDF/E, PDF/UA and PDF/H here.
URL: http://forums.planetpdf.com/forum_posts.asp?TID=2211
Printed Date: 23 Nov 2017 at 11:05am
Software Version: Web Wiz Forums 11.10 - http://www.webwizforums.com


Topic: cross reference table and byte offset
Posted By: Leon
Subject: cross reference table and byte offset
Date Posted: 15 Jul 2017 at 6:55pm
I have got a sample PDF from the Internet, the content is listed below.

%PDF-1.7
1 0 obj
<<
 /Type /Catalog
 /Outlines 2 0 R 
 /Pages 3 0 R
>>
endobj
2 0 obj
<<
 /Type /Outlines 
 /Count 0
>>
endobj
3 0 obj
<<
 /Type /Pages
 /Kids [4 0 R]
 /Count 1
>>
endobj
4 0 obj
<<
 /Type /Page %
 /Parent 3 0 R
 /MediaBox [0 0 612 792]
 /Contents 5 0 R
 /Resources
 << /ProcSet 6 0 R
    /Font << /F1 7 0 R >>
 >>
>>
endobj
5 0 obj
<< /Length 48 >>
stream
BT
/F1 24 Tf
100 700 Td
(Hello World)Tj
ET
endstream
endobj
6 0 obj
[/PDF /Text]
endobj
7 0 obj
<<
 /Type /Font
 /Subtype /Type1
 /Name /F1
 /BaseFont /Helvetica
 /Encoding /MacRomanEncoding
>>
endobj
xref
0 8
0000000000 65535 f
0000000012 00000 n
0000000089 00000 n
0000000145 00000 n
0000000214 00000 n
0000000381 00000 n
0000000485 00000 n
0000000518 00000 n
trailer
<<
 /Size 8
 /Root 1 0 R
>>
startxref
642
%%EOF

Adobe Reader DC can open the document without any problem but it asks if I want to save the document when I close it without making any changes. By searching online I realized that the source of the problem is the cross reference table. More precisely the byte offsets of all objects as well as that for xref itself are wrong. I used a Hex edit to find correct byte offsets and corrected them, and the problem is gone.
But my question is, if xref is to be used to locate the objects in byte stream, the above PDF will certainly lead to the wrong places. It looks to me that Adobe Reader does not rely on xref since it opens the above PDF without any problem. But if that is the case, what is the use of xref?

I appreciate any any input.      



Replies:
Posted By: BAlheit
Date Posted: 16 Jul 2017 at 8:27am
Acrobat Reader uses the xref table. In your case Acrobat Reader repairs the offsets.


Posted By: Leon
Date Posted: 16 Jul 2017 at 2:08pm
Thank you very much for your reply.

In order for Acrobat reader to do that (to repair wrong offsets), it must not rely on xref but search objects directly from the document. Again, back to my question, if everyone (including Adobe) has to search object directly, what is the use of xref?



Posted By: BAlheit
Date Posted: 17 Jul 2017 at 7:04am
Originally posted by Leon Leon wrote:

... In order for Acrobat reader to do that (to repair wrong offsets), it must not rely on xref but search objects directly from the document. ...
Acrobat Reader uses the xref table! When there is a small difference between xref offset and correct offset Acrobat Reader will correct the value.


Posted By: gkaiseril
Date Posted: 19 Jul 2017 at 12:09am
If the table is correct and matches the file size, then no correction is needed. But if the table appears wrong then both Acrobat and Reader will attempt to repair the table.


Posted By: Rob Lyman
Date Posted: 24 Jul 2017 at 8:48pm
The cross-reference table is used as an optimization to allow software to quickly locate important data objects, but it is possible to locate these objects just by scanning the contents of the file.

When the PDF format was introduced in the 1990s, machines were slow enough that this speed-up was necessary. With modern hardware, this optimization is not nearly as critical except for extremely large PDFs; but the XREF table is still required to be correct in a syntactically valid PDF.



-------------
Rob Lyman
Software Engineer
http://www.datalogics.com



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.10 - http://www.webwizforums.com
Copyright ©2001-2017 Web Wiz Ltd. - https://www.webwiz.net