You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2015/04/17 18:47:59 UTC

[jira] [Commented] (PDFBOX-2762) remove parseCOSStream() call from PDFStreamParser

    [ https://issues.apache.org/jira/browse/PDFBOX-2762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500167#comment-14500167 ] 

ASF subversion and git services commented on PDFBOX-2762:
---------------------------------------------------------

Commit 1674353 from [~tilman] in branch 'pdfbox/trunk'
[ https://svn.apache.org/r1674353 ]

PDFBOX-2762: remove parseCOSStream() call from PDFStreamParser, because there are no streams in content streams

> remove parseCOSStream() call from PDFStreamParser
> -------------------------------------------------
>
>                 Key: PDFBOX-2762
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2762
>             Project: PDFBox
>          Issue Type: Task
>          Components: Parsing
>    Affects Versions: 2.0.0
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>             Fix For: 2.0.0
>
>
> This code is found in PDFStreamParser
> {code}
>                 if (c == '<')
>                 {
>                     COSDictionary pod = parseCOSDictionary();
>                     skipSpaces();
>                     if ((char)pdfSource.peek() == 's')
>                     {
>                         retval = parseCOSStream( pod );
>                     }
>                     else
>                     {
>                         retval = pod;
>                     }
>                 }
> {code}
> This is incorrect. PDFStreamParser is for content streams. There are no streams in content streams, the spec requires "All streams shall be indirect objects". An "indirect object" is something between obj and endobj. But indirect objects are not allowed in content streams: "Indirect objects and object references shall not be permitted at all". So parseCOSStream() will never be called. Thus the new code will be
> {code}
>                 if (c == '<')
>                 {
>                     retval = parseCOSDictionary();
>                 }
> {code}
> To be sure, I tested my own test set and the digitalcopora set (250000 files) to see whether parseCOSStream is ever called in PDFStreamParser. No it isn't. How did this incorrect code end up there? Don't know, but it has been there since 2002.
> http://pdfbox.cvs.sourceforge.net/viewvc/pdfbox/pdfbox/src/org/pdfbox/pdfparser/PDFStreamParser.java?revision=1.1&view=markup
> Why do I care about this? It is related to a posting in a mailing list by Andrea Vacondio who mentioned that there are several versions of parseCOSStream(), so I'm trying to clean up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org