You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by Tilman Hausherr <TH...@t-online.de> on 2016/06/26 18:24:30 UTC

font.toUnicode() call in PDFStreamEngine

In PDFStreamEngine.showText there is a call to font.toUnicode(). IMHO 
this isn't needed. It is passed to

showGlyph(textRenderingMatrix, font, code, unicode, w);

This is used in PDFStreamEngine or PDFTextStreamEngine.

In PDFStreamEngine, showGlyph looks like this:

     protected void showGlyph(Matrix textRenderingMatrix, PDFont font, 
int code, String unicode,
                              Vector displacement) throws IOException
     {
         if (font instanceof PDType3Font)
         {
             showType3Glyph(textRenderingMatrix, (PDType3Font)font, 
code, unicode, displacement);
         }
         else
         {
             showFontGlyph(textRenderingMatrix, font, code, unicode, 
displacement);
         }
     }

showType3Glyph doesn't use unicode, neither does showFontGlyph.

In PDFTextStreamEngine.showGlyph(), unicode is overwritten:

unicode = font.toUnicode(code, glyphList);


so it isn't used. I don't see a need to get the unicode at this time. 
Anybody who overrides one of the methods mentioned above can still get 
it later, because the PDFont is also passed.


So my intent is to remove the parameter in the trunk (i.e. for 2.1) in 
the 4 methods mentioned. This will make processing very slightly faster 
and remove the "WARNING: No Unicode mapping for ... in font ..." 
messages in rendering when toUnicode is missing.

Tilman

Re: font.toUnicode() call in PDFStreamEngine

Posted by John Hewson <jo...@jahewson.com>.

> On 27 Jun 2016, at 08:46, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> Am 27.06.2016 um 15:30 schrieb John Hewson:
>>> On 26 Jun 2016, at 11:24, Tilman Hausherr <TH...@t-online.de> wrote:
>>> 
>>> In PDFStreamEngine.showText there is a call to font.toUnicode(). IMHO this isn't needed. It is passed to
>>> 
>>> showGlyph(textRenderingMatrix, font, code, unicode, w);
>>> 
>>> This is used in PDFStreamEngine or PDFTextStreamEngine.
>>> 
>>> In PDFStreamEngine, showGlyph looks like this:
>>> 
>>>    protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode,
>>>                             Vector displacement) throws IOException
>>>    {
>>>        if (font instanceof PDType3Font)
>>>        {
>>>            showType3Glyph(textRenderingMatrix, (PDType3Font)font, code, unicode, displacement);
>>>        }
>>>        else
>>>        {
>>>            showFontGlyph(textRenderingMatrix, font, code, unicode, displacement);
>>>        }
>>>    }
>>> 
>>> showType3Glyph doesn't use unicode, neither does showFontGlyph.
>>> 
>>> In PDFTextStreamEngine.showGlyph(), unicode is overwritten:
>>> 
>>> unicode = font.toUnicode(code, glyphList);
>> But PDFTextStreamEngine is for legacy compatibility only. All “proper” text processing is now in PDFStreamEngine, which is why Unicode is decoded there.
> 
> But what about rendering? It doesn't need the font.toUnicode() result.

That's true, it doesn't. But we can't get rid of it without a breaking API change. Is it really worth it?

-- John

>> 
>>> so it isn't used. I don't see a need to get the unicode at this time. Anybody who overrides one of the methods mentioned above can still get it later, because the PDFont is also passed.
>>> 
>>> 
>>> So my intent is to remove the parameter in the trunk (i.e. for 2.1) in the 4 methods mentioned. This will make processing very slightly faster and remove the "WARNING: No Unicode mapping for ... in font ..." messages in rendering when toUnicode is missing.
>> That would break downstream subclasses of PDFTextStreamEngine.
> 
> Yes of course, that's why I wanted to discuss this first, and wanted to do this in 2.1 only, not in 2.0.*.
> 
> Tilman
> 
> 
>> 
>>> Tilman
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
>> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: font.toUnicode() call in PDFStreamEngine

Posted by Tilman Hausherr <TH...@t-online.de>.

Am 27.06.2016 um 15:30 schrieb John Hewson:
>> On 26 Jun 2016, at 11:24, Tilman Hausherr <TH...@t-online.de> wrote:
>>
>> In PDFStreamEngine.showText there is a call to font.toUnicode(). IMHO this isn't needed. It is passed to
>>
>> showGlyph(textRenderingMatrix, font, code, unicode, w);
>>
>> This is used in PDFStreamEngine or PDFTextStreamEngine.
>>
>> In PDFStreamEngine, showGlyph looks like this:
>>
>>     protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode,
>>                              Vector displacement) throws IOException
>>     {
>>         if (font instanceof PDType3Font)
>>         {
>>             showType3Glyph(textRenderingMatrix, (PDType3Font)font, code, unicode, displacement);
>>         }
>>         else
>>         {
>>             showFontGlyph(textRenderingMatrix, font, code, unicode, displacement);
>>         }
>>     }
>>
>> showType3Glyph doesn't use unicode, neither does showFontGlyph.
>>
>> In PDFTextStreamEngine.showGlyph(), unicode is overwritten:
>>
>> unicode = font.toUnicode(code, glyphList);
> But PDFTextStreamEngine is for legacy compatibility only. All \u201cproper\u201d text processing is now in PDFStreamEngine, which is why Unicode is decoded there.

But what about rendering? It doesn't need the font.toUnicode() result.

>
>> so it isn't used. I don't see a need to get the unicode at this time. Anybody who overrides one of the methods mentioned above can still get it later, because the PDFont is also passed.
>>
>>
>> So my intent is to remove the parameter in the trunk (i.e. for 2.1) in the 4 methods mentioned. This will make processing very slightly faster and remove the "WARNING: No Unicode mapping for ... in font ..." messages in rendering when toUnicode is missing.
> That would break downstream subclasses of PDFTextStreamEngine.

Yes of course, that's why I wanted to discuss this first, and wanted to 
do this in 2.1 only, not in 2.0.*.

Tilman


>
>> Tilman
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
> For additional commands, e-mail: dev-help@pdfbox.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org

Re: font.toUnicode() call in PDFStreamEngine

Posted by John Hewson <jo...@jahewson.com>.

> On 26 Jun 2016, at 11:24, Tilman Hausherr <TH...@t-online.de> wrote:
> 
> In PDFStreamEngine.showText there is a call to font.toUnicode(). IMHO this isn't needed. It is passed to
> 
> showGlyph(textRenderingMatrix, font, code, unicode, w);
> 
> This is used in PDFStreamEngine or PDFTextStreamEngine.
> 
> In PDFStreamEngine, showGlyph looks like this:
> 
>    protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode,
>                             Vector displacement) throws IOException
>    {
>        if (font instanceof PDType3Font)
>        {
>            showType3Glyph(textRenderingMatrix, (PDType3Font)font, code, unicode, displacement);
>        }
>        else
>        {
>            showFontGlyph(textRenderingMatrix, font, code, unicode, displacement);
>        }
>    }
> 
> showType3Glyph doesn't use unicode, neither does showFontGlyph.
> 
> In PDFTextStreamEngine.showGlyph(), unicode is overwritten:
> 
> unicode = font.toUnicode(code, glyphList);

But PDFTextStreamEngine is for legacy compatibility only. All “proper” text processing is now in PDFStreamEngine, which is why Unicode is decoded there.

> so it isn't used. I don't see a need to get the unicode at this time. Anybody who overrides one of the methods mentioned above can still get it later, because the PDFont is also passed.
> 
> 
> So my intent is to remove the parameter in the trunk (i.e. for 2.1) in the 4 methods mentioned. This will make processing very slightly faster and remove the "WARNING: No Unicode mapping for ... in font ..." messages in rendering when toUnicode is missing.

That would break downstream subclasses of PDFTextStreamEngine.

> Tilman
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org