You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Michael Klink (Jira)" <ji...@apache.org> on 2020/07/15 09:23:00 UTC

[jira] [Comment Edited] (PDFBOX-4915) "Page tree root must be a dictionary" on PDDocument.load

    [ https://issues.apache.org/jira/browse/PDFBOX-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17158011#comment-17158011 ] 

Michael Klink edited comment on PDFBOX-4915 at 7/15/20, 9:22 AM:
-----------------------------------------------------------------

There are some errors in the cross reference table of your PDF:
 * It has multiple entries like this:
{noformat}
0000000000 00000 n
{noformat}
These entries claim that the corresponding object can be found at the start (offset 0) of the file. But there, after two comment lines, actually is the object 1 which is not the expected object. Thus, these pointers are incorrect.
 If these entries are intended to mean something like "unused" or {{null}}, an "{{oooooooooo ggggg f}}" entry should have been used.

 * It has only a single cross reference table, no incremental updates, but it already contains generation 1 objects, e.g. object 1:
{noformat}
xref
0 1712
0000000000 65535 f 
0000000015 00001 n  
{noformat}
This is invalid, in the first document revision there may only be generation 0 objects:
{panel:title=ISO 32000-1, section 7.5.4 "Cross-Reference Table"}
Except for object number 0, all objects in the cross-reference table shall initially have generation numbers of 0.
{panel}
This object 1 in generation 1 actually is the page tree root. Probably PDFBox has problems with this invalid generation.


was (Author: mkl):
There are some errors in the cross reference table of your PDF:

* It has multiple entries like this:
{noformat}
0000000000 00000 n
{noformat}
  These entries claim that the corresponding object can be found at the start (offset 0) of the file. But there, after two comment lines, actually is the object 1 which is not the expected object. Thus, these pointers are incorrect.
  If these entries are intended to mean something like "unused" or {{null}}, an {{... f}} entry should have been used.
* It has only a single cross reference table, no incremental updates, but it already contains generation 1 objects, e.g. object 1:
{noformat}
xref
0 1712
0000000000 65535 f 
0000000015 00001 n  
{noformat}
  This is invalid, in the first document revision there may only be generation 0 objects:
{panel:title=ISO 32000-1, section 7.5.4 "Cross-Reference Table"}
Except for object number 0, all objects in the cross-reference table shall initially have generation numbers of 0.
{panel}
  This object 1 in generation 1 actually is the page tree root. Probably PDFBox has problems with this invalid generation.

> "Page tree root must be a dictionary" on PDDocument.load
> --------------------------------------------------------
>
>                 Key: PDFBOX-4915
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4915
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.19
>            Reporter: Gauthier Roebroeck
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>         Attachments: Black Bullet - Volume 01 - Those Who Would Be Gods [Yen Press][Kobo_Kitzoku].pdf, Screenshot 2020-07-14 at 20.19.40.png
>
>
> Hi,
> i have a PDF file that throws the following exception:
> {{java.io.IOException: Page tree root must be a dictionaryjava.io.IOException: Page tree root must be a dictionary at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:198) ~[pdfbox-2.0.19.jar:2.0.19] at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:226) ~[pdfbox-2.0.19.jar:2.0.19] at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1222) ~[pdfbox-2.0.19.jar:2.0.19] at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1122) ~[pdfbox-2.0.19.jar:2.0.19]}}
> This happens when loading the document from an InputStream.
> The document can be opened properly using Preview on Mac.
>  
> I have checked the PDF structure (even though i don't know it very well), from what i can see it could be because the /Pages is not the first element under the /Root.
>  
> !Screenshot 2020-07-14 at 20.19.40.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org