You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2017/07/15 10:34:01 UTC

[jira] [Commented] (PDFBOX-3870) Wrong type of referenced length in COSParser

    [ https://issues.apache.org/jira/browse/PDFBOX-3870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088546#comment-16088546 ] 

Tilman Hausherr commented on PDFBOX-3870:
-----------------------------------------

Something is messed up with that file. startxref is 42461 but the file size is 37934. The file is very recent. It was created with Apache FOP 1.0. Maybe tell your client to try the current version. Or find out whether a data block was lost without this being noticed.

> Wrong type of referenced length in COSParser
> --------------------------------------------
>
>                 Key: PDFBOX-3870
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3870
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.6
>            Reporter: Jorge Spinsanti
>         Attachments: COSParserIOException.pdf
>
>
> I got an exception to extract text from PDF with Tika (exception thrown on pdfbox code):
> {code}
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.pdf.PDFParser@2be78cf6
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:286)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> Caused by: java.io.IOException: Wrong type of referenced length object COSObject{11, 0}: COSNull
> 	at org.apache.pdfbox.pdfparser.COSParser.getLength(COSParser.java:908)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseCOSStream(COSParser.java:950)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseFileObject(COSParser.java:781)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:742)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseObjectDynamically(COSParser.java:673)
> 	at org.apache.pdfbox.pdfparser.COSParser.parseDictObjects(COSParser.java:633)
> 	at org.apache.pdfbox.pdfparser.PDFParser.initialParse(PDFParser.java:241)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:276)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1132)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1066)
> 	at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:141)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> 	... 24 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org