You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/06/26 22:15:25 UTC

[jira] [Updated] (PDFBOX-2163) inline image with EI in the middle incorrectly parsed

     [ https://issues.apache.org/jira/browse/PDFBOX-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-2163:
------------------------------------

    Summary: inline image with EI in the middle incorrectly parsed  (was: inline image with EI in die middle incorrectly parsed)

> inline image with EI in the middle incorrectly parsed
> -----------------------------------------------------
>
>                 Key: PDFBOX-2163
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2163
>             Project: PDFBox
>          Issue Type: Bug
>          Components: Parsing
>            Reporter: Tilman Hausherr
>
> This PDF
> http://digitalcorpora.org/corp/nps/files/govdocs1/876/876636.pdf
> has an exception which is because the end of an inline image is improperly detected. The stream looks like this:
> {code}
> BI
>   /W 452
>   /H 169
>   /BPC 8
>   /CS /RGB
>   /D [0.0 1.0 0.0 1.0 0.0 1.0]
>   /F [/A85 /Fl]
> ID
> ......................................................
> ....................................................EI
> ......................................................
> ...
> ....
> EI Q
> {code}
> The inline images are handled in PDFStreamParser. This is tricky, we look for followup bin data to check that it isn't an EI in the middle, but here it isn't bin data, but ascii85 stuff. We also can't request that there be a LF before the EI, because I remember that I had a PDF at work created by a well known company that doesn't use it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)