You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/06/26 23:07:26 UTC

[jira] [Commented] (PDFBOX-2163) inline image with EI in the middle incorrectly parsed

    [ https://issues.apache.org/jira/browse/PDFBOX-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045177#comment-14045177 ] 

Tilman Hausherr commented on PDFBOX-2163:
-----------------------------------------

And another:
http://digitalcorpora.org/corp/nps/files/govdocs1/258/258126.pdf

This time the EI are in the middle. I think we should use an alternative parsing strategy for Ascii85 encoded inline images, e.g. assuming that the EI is in a separate line.

> inline image with EI in the middle incorrectly parsed
> -----------------------------------------------------
>
>                 Key: PDFBOX-2163
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2163
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>            Reporter: Tilman Hausherr
>
> This PDF
> http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
> has an exception because the end of an inline image is improperly detected. The stream looks like this:
> {code}
> BI
>   /W 452
>   /H 169
>   /BPC 8
>   /CS /RGB
>   /D [0.0 1.0 0.0 1.0 0.0 1.0]
>   /F [/A85 /Fl]
> ID
> ......................................................
> ....................................................EI
> ......................................................
> ...
> ....
> EI Q
> {code}
> The inline images are handled in PDFStreamParser. This is tricky, we look for followup bin data to check that it isn't an EI in the middle, but here it isn't bin data, but ascii85 stuff. We also can't request that there be a LF before the EI, because I remember that I had a PDF at work created by a well known company that doesn't use it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)