You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (JIRA)" <ji...@apache.org> on 2014/06/09 13:51:01 UTC

[jira] [Comment Edited] (PDFBOX-2120) Regression: Type 1 font corrupted

    [ https://issues.apache.org/jira/browse/PDFBOX-2120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021805#comment-14021805 ] 

Tilman Hausherr edited comment on PDFBOX-2120 at 6/9/14 11:50 AM:
------------------------------------------------------------------

One good news: it does not happen when using the -nonSeq option.

The sequential parser doesn't handle indirect lengths when these come later in the file, so it reads the streams sequentially looking for "endstream" which is NOT correct and at most, heuristic. PDFBOX-2079 removed a final CR LF or LF from such a stream, because of the assumption that a PDF writing application will append CR LF or LF before writing "endstream". In your file, when taking the length into account, the stream ends with a CR. And then there's an LF. I remove both so the font file ends with "cleartomark" without a final CR LF or LF and Adobe doesn't like this.

The sequential parser is a dead end and nobody should use it but it is used, because the sequential parser needs an extra parameter :-(

I am adding an exception for assumed ASCII streams by testing the beginning of the stream like in PDFBOX-1164, i.e. that these won't get the filtering that I introduced with PDFBOX-2079 and thus will keep a final CR LF or LF.

This works for your file, but I will do some more tests.

In the future, the sequential parser should be deleted, and so should the filter that I introduced with PDFBOX-2079.


was (Author: tilman):
One good news: it does not happen when using the -nonSeq option.

The sequential parser doesn't handle indirect lengths when these come later in the file, so it reads the streams sequentially looking for "endstream" which is NOT correct and at most, heuristic. PDFBOX-2079 removed a final CR LF or LF from such a stream, because of the assumption that a PDF writing application will append CR LF or LF before writing "endstream". In your file, when taking the length into account, the stream ends with a CR. And then there's an LF. I remove both so the font file ends with "cleartomark" without a final CR LF or LF and Adobe doesn't like this.

The sequential parser is a dead end and nobody should use it but it is used, because the sequential parser needs an extra parameter :-(

I am adding an exception for PostScript streams (assuming that these start with "%!PS"), i.e. that these won't get the filtering that I introduced with PDFBOX-2079 and thus will keep a final CR LF or LF.

This works for your file, but I will do some more tests.

In the future, the sequential parser should be deleted, and so should the filter that I introduced with PDFBOX-2079.

> Regression: Type 1 font corrupted
> ---------------------------------
>
>                 Key: PDFBOX-2120
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2120
>             Project: PDFBox
>          Issue Type: Bug
>            Reporter: simon steiner
>            Assignee: Tilman Hausherr
>              Labels: regression
>         Attachments: t1subset2.pdf
>
>
> You get a warning when opening output in adobe reader
> Blank line after "cleartomark" missing in fontfile
> java -jar ~/pdf-box-svn/app/target/pdfbox-app-2.0.0-SNAPSHOT.jar WriteDecodedDoc t1subset2.pdf



--
This message was sent by Atlassian JIRA
(v6.2#6252)