You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Son (JIRA)" <ji...@apache.org> on 2008/11/02 15:33:44 UTC

[jira] Commented: (PDFBOX-383) BaseParser incorrectly handling stream, exhibiting IOException

    [ https://issues.apache.org/jira/browse/PDFBOX-383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644589#action_12644589 ] 

Son commented on PDFBOX-383:
----------------------------

we should note that the sample file was generated using PDFBox self using the code shipped with fop 0.94.

the problem arises when writing stream. current pdfbox implementation first writes the Length as indirect object, then content and then the length object.


> BaseParser incorrectly handling stream, exhibiting IOException
> --------------------------------------------------------------
>
>                 Key: PDFBOX-383
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-383
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>    Affects Versions: 0.7.3
>         Environment: pdfbox 0.73 with java 5 running on windows platform
>            Reporter: Son
>         Attachments: BaseParser.java, fail.pdf
>
>
> when loading pdf file containing a file attachment annotation , errors might occurs when 2 conditions arise:
> - the Length value for the dictionary of F stream holds an indirect reference to a integer value
> - the content of the filtered stream contains the word 'endstream'
> typically this occurs when, in the pdf file, there is a stream description as follows:
> 12 0 obj
> << /Length 16 0 R
> /Filter /FlateDecode
> >>
> stream
> {content}
> endstream
> endobj
> ...
> 16 0 obj
> {length}
> endobj
> ....
> and it the {content} (filtered) contains the (filtered) string "endstream".
> (see on line 3700 of the attachment)
> the problem is related to the way stream content is (always) read by method readUntilEndStream () that stop on first 'endstream' sequence end.
> a (partial) fix was made, that reads the stream content 3 different ways:
> - if the Length is known (this is a direct object), the {length} bytes are read and written to the stream FilteredStream
> - if the Length is unknown and if the filter is FlateFilter, the code unfilters the datas (the FlateDecode algorythm allows for not knowing the length of encoded data ahead of time) and associates to the stream's unfiltered stream
> - otherwise, let current behavior
> Running the modified code on files exhibiting errors has fixed problems that was encountered. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.