You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Timo Boehme (JIRA)" <ji...@apache.org> on 2012/06/14 00:14:42 UTC

[jira] [Resolved] (PDFBOX-1099) Only parsing object streams if they are referenced by the xref table / stream

     [ https://issues.apache.org/jira/browse/PDFBOX-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Timo Boehme resolved PDFBOX-1099.
---------------------------------

       Resolution: Fixed
    Fix Version/s: 1.8.0

Set this as resolved with patch applied in rev. 1350033. The patch does not exactly what was written in improvement description but merely fixes a bug resulting from choosing wrong object because XREF table information was not used when parsing object streams. The reason for not following the original feature description is as follows. The intended functionality is fulfilled using NonSequentialPDFParser. The PDFParser will get more the role of a fall back in case xref information is wrong. Here skipping object streams which are not referenced by XREF might render broken documents which could be read before unreadable. The patch at least ensures that only the referenced objects from stream overwrite objects read elsewhere. 
                
> Only parsing object streams if they are referenced by the xref table / stream
> -----------------------------------------------------------------------------
>
>                 Key: PDFBOX-1099
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1099
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Parsing
>            Reporter: Thomas Chojecki
>            Assignee: Timo Boehme
>             Fix For: 1.8.0
>
>         Attachments: 2012-06-13_COSDocument_xrefObjStream.patch
>
>
> Some pdf documents have objects streams and don't reference them through the xref table / stream. To prevent the stream parser to dereference such object streams, we need to implement the type 2 part (case 2) inside the PDFXRefStreamParser and store the objects inside a map. This will take some load from the stream parser (see PDFBOX-1098) and causes less failures while parsing a document.
> A sample pdf can be get from the issue PDFBOX-1098 and a patch is coming soon. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira