You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2018/04/05 17:49:00 UTC

[jira] [Resolved] (PDFBOX-4097) Compressed object will lost when brute force search failed to handle compressed streams

     [ https://issues.apache.org/jira/browse/PDFBOX-4097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-4097.
----------------------------------------
    Resolution: Fixed

The attached file can be opened and I've checked some other files as well. Set to resolved

> Compressed object will lost when brute force search failed to handle compressed streams
> ---------------------------------------------------------------------------------------
>
>                 Key: PDFBOX-4097
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4097
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 2.0.8
>            Reporter: Cheng Zhong
>            Assignee: Andreas Lehmkühler
>            Priority: Major
>             Fix For: 2.0.10, 3.0.0 PDFBox
>
>         Attachments: 奥美医疗-IPO.pdf
>
>
> Compressed object described in cross-reference streams will lost when brute force search failed to handle such streams.
> The attached PDF has an object 1336, but it had a offset that referenced to object 1828. The inconsistency led to a brute force search. (Introduced by *COSParser.checkXrefOffsets*)
> During the search (in *bfSearchForObjStreams*), Object stream 1828, 1829, 1830 failed to decompress due to "corrupted" stream(yes, the *Params* field was missing in the dictionary or the *Filter* was wrong). Thus, 462 compressed objects described in cross-reference streams are lost. Since important objects (the Root, the Pages, etc.) referred to objects in 1828 or something, all resolved to null (because the corrected XRefOffsets doens't have them). Further parsing is impossible.
> However, when I tried to bypass *checkXrefOffsets*, the PDF shows correctly without any (noticeable) error. It seemed that object 1336 is not used in the PDF.
> "Corrupted" 1828:
> {code:java}
> 1828 0 obj
> <<
> /Length 2176
> /Type /ObjStm
> /N 200
> /First 2103
> /Filter /FlatDecode
> >>
> ...{code}
> It doesn't work well in *bfSearchForObjStreams* but works in *parseObjectStream*.
>  
> Would it be nice to have a fallback to preserve compressed stream object key offsets, when we some error in brute force search?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org