You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2014/11/30 00:19:12 UTC

[jira] [Commented] (PDFBOX-2527) IOException: Negative seek offset in NonSequentialPDFParser

    [ https://issues.apache.org/jira/browse/PDFBOX-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14228975#comment-14228975 ] 

Andreas Lehmkühler commented on PDFBOX-2527:
--------------------------------------------

It looks like the pdf is truncated somewhere in the middle. I'm working on an improved self repair but as long as the parser isn't able to ignore corrupt parts it won't render.

> IOException: Negative seek offset in NonSequentialPDFParser
> -----------------------------------------------------------
>
>                 Key: PDFBOX-2527
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2527
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.8.8, 2.0.0
>            Reporter: Tilman Hausherr
>            Priority: Minor
>         Attachments: PDFBOX-2527-069020.pdf
>
>
> {code}
> Exception in thread "main" java.io.IOException: Negative seek offset
> 	at java.io.RandomAccessFile.seek(Native Method)
> 	at org.apache.pdfbox.io.RandomAccessBufferedFileInputStream.seek(RandomAccessBufferedFileInputStream.java:116)
> 	at org.apache.pdfbox.io.PushBackInputStream.seek(PushBackInputStream.java:234)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.initialParse(NonSequentialPDFParser.java:492)
> 	at org.apache.pdfbox.pdfparser.NonSequentialPDFParser.parse(NonSequentialPDFParser.java:1013)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:951)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:897)
> 	at org.apache.pdfbox.tools.PDFReader.parseDocument(PDFReader.java:375)
> 	at org.apache.pdfbox.tools.PDFReader.openPDFFile(PDFReader.java:340)
> 	at org.apache.pdfbox.tools.PDFReader.main(PDFReader.java:326)
> 	at org.apache.pdfbox.tools.PDFBox.main(PDFBox.java:80)
> {code}
> This happens with several malformed PDFs from the test set in TIKA-1442. These files (303385, 069020, 303385, 742141, 982996) all have some trash at the end.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)