You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2012/05/22 00:29:41 UTC
[jira] [Resolved] (PDFBOX-773) expected='obj' actual='o' error while parsing the attached PDF

     [ https://issues.apache.org/jira/browse/PDFBOX-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme resolved PDFBOX-773.
--------------------------------

    Resolution: Won't Fix
      Assignee: Timo Boehme

Provided PDF document is broken.

Typically this kind of problem stems from sequentially parsing of PDFParser and can be resolved using NonSequentialPDFParser (option -nonSeq in some tools). However the provided document is broken (at least in the xref parts - multiple times an xref line is splitted by an extra \n (NonSequentialPDFParser will point you to the problematic offset).

Other readers will silently try to reconstruct the object references, which might result in content errors. PDFBOX does not have a special object structure reconstruction mode (only the standard PDFParser). Such a xref repair tool would be helpful to parse even broken documents with NonSequentialPDFParser. This however would be a feature request. At least the provided document would be a good test case for such a tool.
                
> expected='obj' actual='o' error while parsing the attached PDF
> --------------------------------------------------------------
>
>                 Key: PDFBOX-773
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-773
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 1.2.0, 1.3.1, 1.6.0
>         Environment: Sun JDK 6u21, Windows 7 x86
>            Reporter: Marin Nozhchev
>            Assignee: Timo Boehme
>         Attachments: Andersens_Fairy_Tales.zip, test_with_1.6.0_full.txt
>
>
> Parsing the attached PDF fails with the following error:
> Caused by: java.io.IOException: expected='obj' actual='o' org.apache.pdfbox.io.PushBackInputStream@11d75b9
> 	at org.apache.pdfbox.pdfparser.PDFParser.parseObject(PDFParser.java:509)
> 	at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:179)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:859)
> 	at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:826)
>         ...
> The same errors appears with the 1.1, 1.2 releases and the 1.3 latest trunk so far - svn rev. 962879 .
> The file opens without warnings or any visible issues in the latest versions of Foxit Reader and Acrobat Reader on Windows. The parsing was done via the Apache Tika Parser.
> Thank you

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira