You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2020/04/13 20:09:00 UTC

[jira] [Updated] (PDFBOX-4811) Glyphs getting lost when rendering

     [ https://issues.apache.org/jira/browse/PDFBOX-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tilman Hausherr updated PDFBOX-4811:
------------------------------------
    Description: 
I missed a rendering change (sorry) in the linked PDF.js issue that happened in PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad input due to having different data.

The CMap has these ranges:
{code:java}
4 begincodespacerange
<00><7f>
<c080><dfbf>
<e08080><efbfbf>
<f0808080><f7bfbfbf>
endcodespacerange
{code}
The content stream has segments like
{code:java}
(Check\340up Date:2020/ 3/ 4  11:46) Tj
{code}
0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range fits, and this means it reads 4 bytes until it noticed that this has failed. After the failure it doesn't reposition. So this is displayed as "Check \-Date" instead of "Check \-up Date", i.e. input is lost. The "-" is also a default glyph.

The solution is to remember the position and to reposition there. I'm using mark() and reset() which, surprisingly, works both when loading in memory and when loading with temp file.

  was:
I missed a rendering change (sorry) in the linked PDF.js issue that happened in PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad input due to having different data.

The CMap has these ranges:
{code:java}
4 begincodespacerange
<00><7f>
<c080><dfbf>
<e08080><efbfbf>
<f0808080><f7bfbfbf>
endcodespacerange
{code}
The content stream has segments like
{code:java}
(Check\340up Date:2020/ 3/ 4  11:46) Tj
{code}
0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range fits, and this means it reads 4 bytes until it noticed that this has failed. After the failure it doesn't reposition. So this is displayed as "Check -Date" instead of "Check -up Date", i.e. input is lost. The "-" is also a default glyph.

The solution is to remember the position and to reposition there. I'm using mark() and reset() which, surprisingly, works both when loading in memory and when loading with temp file.


> Glyphs getting lost when rendering
> ----------------------------------
>
>                 Key: PDFBOX-4811
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-4811
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 2.0.19
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>            Priority: Major
>             Fix For: 2.0.20, 3.0.0 PDFBox
>
>
> I missed a rendering change (sorry) in the linked PDF.js issue that happened in PDFBOX-4810 but it is not a regression, rather a difference in displaying a bad input due to having different data.
> The CMap has these ranges:
> {code:java}
> 4 begincodespacerange
> <00><7f>
> <c080><dfbf>
> <e08080><efbfbf>
> <f0808080><f7bfbfbf>
> endcodespacerange
> {code}
> The content stream has segments like
> {code:java}
> (Check\340up Date:2020/ 3/ 4  11:46) Tj
> {code}
> 0340 is 0xE0. The current code at CMap.readCode() reads bytes until a range fits, and this means it reads 4 bytes until it noticed that this has failed. After the failure it doesn't reposition. So this is displayed as "Check \-Date" instead of "Check \-up Date", i.e. input is lost. The "-" is also a default glyph.
> The solution is to remember the position and to reposition there. I'm using mark() and reset() which, surprisingly, works both when loading in memory and when loading with temp file.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org