You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Thomas Chojecki (JIRA)" <ji...@apache.org> on 2012/05/22 15:47:42 UTC
[jira] [Commented] (PDFBOX-1098) Wrong implemented stream reader
[ https://issues.apache.org/jira/browse/PDFBOX-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280955#comment-13280955 ]
Thomas Chojecki commented on PDFBOX-1098:
-----------------------------------------
One duplicate (PDFBOX-1106) is still open / unresolved
> Wrong implemented stream reader
> -------------------------------
>
> Key: PDFBOX-1098
> URL: https://issues.apache.org/jira/browse/PDFBOX-1098
> Project: PDFBox
> Issue Type: Bug
> Components: Parsing
> Reporter: Thomas Chojecki
> Assignee: Timo Boehme
> Priority: Critical
> Labels: ZLIB
> Fix For: 1.7.0
>
>
> The BaseParser#readUntilEndStream(OutputStream) method is parsing streams the wrong way. [1]
> This method will start reading a stream till the keyword "endstream" is reached and don't care about the length value inside the dictionary. This implementation brokes nearly every pdf document with a pdf embedded inside a stream [2].
> Encoder that is used for compressing streams can be block-based (like FlateDecode which is mostly used). If a block of data that should be compressed don't spare space after compressing, the encode do not compress this block and mark it as uncompressed. So a stream can containing compressed and uncompressed parts. So if someone try to embed pdf documents with streams inside a stream, the encoder will left most parts of the document uncompressed. Such parts can contain plan text like "endstream" or other critical keywords that can cause the parser to stop.
> So we need to read the whole stream length that was wrote inside the dictionary and don't look at "endstream" keywords until the end is reached.
> The current stream parser cause a ZIPException with the Message "Unexpected end of ZLIB input stream".
> A sample pdf and a patch is coming soon.
> [1] PDF 32000-1:2008 -> 7.3.8.2 Stream Extent
> [2] PDF 32000-1:2008 -> 7.11.4 Embedded File Streams
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira