You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2012/06/13 16:21:43 UTC

[jira] [Assigned] (PDFBOX-1099) Only parsing object streams if they are referenced by the xref table / stream

     [ https://issues.apache.org/jira/browse/PDFBOX-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme reassigned PDFBOX-1099:
-----------------------------------

    Assignee: Timo Boehme

While NonSequentialPDFParser works correctly the issue still exists with PDFParser. Furthermore we have to use XREF information to decide if an object in an object stream is references by XREF table or not (currently if an object already exists the object from obj stream is skipped).
                
> Only parsing object streams if they are referenced by the xref table / stream
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-1099
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1099
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>            Reporter: Thomas Chojecki
>            Assignee: Timo Boehme
>
> Some pdf documents have objects streams and don't reference them through the xref table / stream. To prevent the stream parser to dereference such object streams, we need to implement the type 2 part (case 2) inside the PDFXRefStreamParser and store the objects inside a map. This will take some load from the stream parser (see PDFBOX-1098) and causes less failures while parsing a document.
> A sample pdf can be get from the issue PDFBOX-1098 and a patch is coming soon. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira