You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2015/02/23 08:36:12 UTC
[jira] [Comment Edited] (PDFBOX-2527) IOException: Negative seek offset in NonSequentialPDFParser

    [ https://issues.apache.org/jira/browse/PDFBOX-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14333073#comment-14333073 ] 

Andreas Lehmkühler edited comment on PDFBOX-2527 at 2/23/15 7:35 AM:
---------------------------------------------------------------------

After my last changes the Negative seek offset exception no longer appears. Now, an IOException about the real cause (the pdf is incomplete and has some additional junk at the end) is thrown:

{code}
Exception in thread "main" java.io.IOException: Corrupt XRefTable Entry - ObjID: 23
        at org.apache.pdfbox.pdfparser.COSParser.parseXrefTable(COSParser.java:1789)
        at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:239)
        at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:314)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:373)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:811)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:757)
        at org.apache.pdfbox.tools.PDFReader.parseDocument(PDFReader.java:375)
        at org.apache.pdfbox.tools.PDFReader.openPDFFile(PDFReader.java:340)
        at org.apache.pdfbox.tools.PDFReader.main(PDFReader.java:326)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:80)
{code}

I'm thinking about another self repair mechanism to handle corrupt files like this, but I've to run some tests first.


was (Author: lehmi):
After my last changes the Negative seek offset exception no longer appears. Now, an IOException is thrown:

{code}
Exception in thread "main" java.io.IOException: Corrupt XRefTable Entry - ObjID: 23
        at org.apache.pdfbox.pdfparser.COSParser.parseXrefTable(COSParser.java:1789)
        at org.apache.pdfbox.pdfparser.COSParser.parseXref(COSParser.java:239)
        at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:314)
        at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:373)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:811)
        at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:757)
        at org.apache.pdfbox.tools.PDFReader.parseDocument(PDFReader.java:375)
        at org.apache.pdfbox.tools.PDFReader.openPDFFile(PDFReader.java:340)
        at org.apache.pdfbox.tools.PDFReader.main(PDFReader.java:326)
        at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:80)
{code}

I'm thinking about another self repair mechanism to handle corrupt files like this, but I've to run some tests first.

> IOException: Negative seek offset in NonSequentialPDFParser
> -----------------------------------------------------------
>
>                 Key: PDFBOX-2527
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2527
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.8, 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Andreas Lehmkühler
>            Priority: Minor
>             Fix For: 2.0.0
>
>         Attachments: PDFBOX-2527-069020.pdf
>
>
> {code}
> Exception in thread "main" java.io.IOException: Negative seek offset
> 	at java.io.RandomAccessFile.seek(Native Method)
> 	at org.apache.pdfbox.io.RandomAccessBufferedFileInputStream.seek(RandomAccessBufferedFileInputStream.java:116)
> 	at org.apache.pdfbox.io.PushBackInputStream.seek(PushBackInputStream.java:234)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:492)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:1013)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:951)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:897)
> 	at org.apache.pdfbox.tools.PDFReader.parseDocument(PDFReader.java:375)
> 	at org.apache.pdfbox.tools.PDFReader.openPDFFile(PDFReader.java:340)
> 	at org.apache.pdfbox.tools.PDFReader.main(PDFReader.java:326)
> 	at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:80)
> {code}
> This happens with several malformed PDFs from the test set in TIKA-1442. These files (303385, 069020, 303385, 742141, 982996) all have some trash at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org