You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/10/11 01:59:41 UTC

[jira] [Updated] (PDFBOX-2351) /XRefStm content missing in saved file

     [ https://issues.apache.org/jira/browse/PDFBOX-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Hewson updated PDFBOX-2351:
--------------------------------
    Fix Version/s: 2.0.0

> /XRefStm content missing in saved file 
> ---------------------------------------
>
>                 Key: PDFBOX-2351
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2351
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>             Fix For: 2.0.0
>
>
> Do this:
> - open the file immo-kurier_arsenal_93x62.pdf, PDFBOX-1577.pdf, PDFBOX-1756-436857.pdf, PDFBOX-2251-070075.pdf, test-landscape2.pdf or any file that has an /XRefStm with loadNonSeq
> - call getDocumentCatalog()
> - save to another file
> - open that file with loadNonSeq()
> {code}
> java.io.IOException: Error: Expected a long type at offset 688, instead got 'ï»¿"'
> 	at org.apache.pdfbox.pdfparser.BaseParser.readLong(BaseParser.java:1718)
> 	at org.apache.pdfbox.pdfparser.BaseParser.readObjectNumber(BaseParser.java:1645)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parseXrefObjStream(NonSequentialPDFParser.java:548)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:410)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:794)
> 	at org.apache.pdfbox.pdmodel.PDDocument.loadNonSeq(PDDocument.java:1156)
> {code}
> The saved file still has the old /XRefStm value, but no content. I debugged a bit, it is confusing - the /XRefStm is never read, instead the /Prev is used, which leads to an old-style xref table. When saving, the existing /XRefStm value is kept in doWriteXRef() even if PDFBox "believes" it has no XRefStream. But doWriteXRefInc() is smarter and deletes the item if there is no XRefStream.
> I haven't tested it with 1.8. We should test it if there's a fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)