You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2017/09/06 16:48:00 UTC

[jira] [Commented] (PDFBOX-3923) Expected a long type at offset 52152, instead got 'xref'

    [ https://issues.apache.org/jira/browse/PDFBOX-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16155655#comment-16155655 ] 

Andreas Lehmkühler commented on PDFBOX-3923:
--------------------------------------------

{code}
xref
0 39
0000000000 65535 f 
0000000015 00000 n 
0000000256 00000 n 
.... other entries
0000048836 00000 n 
0000052152 00000 n 
trailer
<<
/Root 2 0 R
/Size 39
/Info 1 0 R
>>
startxref
52152
{code}
There is a loop in the xref table. The last entry points to the beginning of the xref table. I guess the other readers are able to read the pdf as they parse on demand. PDFBox still parses all objects first and stumbles upon that object reference. We have to wait for the parse on demand feature

> Expected a long type at offset 52152, instead got 'xref'
> --------------------------------------------------------
>
>                 Key: PDFBOX-3923
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3923
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0, 2.0.7
>         Environment: Java 1.8.0_144-b01
>            Reporter: Tres Finocchiaro
>         Attachments: P17090403580.pdf
>
>
> This reads as a duplicate of PDFBOX-2441, PDFBOX-3179 and several others marked as resolved in 2.0.0 however this bug is reproducible in PDFBOX 2.0.0 as well as PDFBOX 2.0.7.
> The attached PDF file is parsable by Chrome (PDFium), Mozilla (pdf.js), Edge, Windows 10 Reader and Adobe Acrobat but fails using PDFBOX 2.0.0 and PDFBOX 2.0.7 with the following error.
> {code}
> Exception in thread "main" java.io.IOException: Error: Expected a long type at offset 52152, instead got 'xref'
>         at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1358)
>         at org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1286)
>         at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:760)
>         at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:742)
>         at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673)
>         at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633)
>         at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241)
>         at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1011)
>         at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:949)
>         at org.apache.pdfbox.tools.PrintPDF.main(PrintPDF.java:140)
>         at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:72)
> Caused by: java.lang.NumberFormatException: For input string: "xref"
>         at java.lang.NumberFormatException.forInputString(Unknown Source)
>         at java.lang.Long.parseLong(Unknown Source)
>         at java.lang.Long.parseLong(Unknown Source)
>         at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1353)
>         ... 11 more
> {code}
> We do not generate this PDF file so we are unaware of the origin but the creator has given permission to share this file publicly for troubleshooting purposes.  We can ask any questions to the creator upon request.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org