You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by mehdi houshmand <me...@gmail.com> on 2011/03/17 17:27:22 UTC

CID Font Question

Hi Guys,

I found an issue with a True Type Font in PDF, I have attached a PDF
with the possible bug (buggy.pdf) and with my "fix" (fixed.pdf). The
issue is that if you copy/paste the text from the normal-weighted font
(top line) of the PDF, the " " (space) and "!" (exclamation mark)
characters are mapped to unicode index \uFFFF.

Initially I thought this was a bug in the font, so I looked at the
"cmap" table in the font to see what unicode index these glyphs were
mapped to, and I found that they were the 2nd and 3rd entries in the
"cmap" table. This tickled my curiosity because all the fonts I
remember (and I checked a couple to be sure) have the first 3 glyphs
mapped to \u0000 or \uFFFF and in their CID is .notdef. The BOLD
version of the same font (in both PDFs) works fine, and as expected
the first 3 glyphs are mapped to \uFFFF, \u0000 and \u0000
respectively.

I also checked the code-base and in o.a.f.fonts.CIDSubset has the
following lines of code:

    /**
     * Adds the initial 3 glyphs which are the same for all CID subsets.
     */
    public void setupFirstThreeGlyphs() {
        // Make sure that the 3 first glyphs are included
        usedGlyphs.put(new Integer(0), new Integer(0));
        usedGlyphsIndex.put(new Integer(0), new Integer(0));
        usedGlyphsCount++;
        usedGlyphs.put(new Integer(1), new Integer(1));
        usedGlyphsIndex.put(new Integer(1), new Integer(1));
        usedGlyphsCount++;
        usedGlyphs.put(new Integer(2), new Integer(2));
        usedGlyphsIndex.put(new Integer(2), new Integer(2));
        usedGlyphsCount++;
    }

So I checked the specification and no where does it suggest that the
first THREE are reserved, it does however say that CID 0 should be
.notdef. (see quote below, p340 of PDF spec).

"Every CIDFont must contain a glyph description for CID 0, which is
analogous to the .notdef character name in simple fonts (see “Handling
Undefined Characters” on page 355)."

My question is this, is this a FOP bug or is this a bug in the font
we're using? If it's a fop bug, I'd be more than happy to fix it
(delete the 6 lines and change the method name). If, however, it's a
font bug, then which spec should I be looking at? What is the bug? I
should also mention, that I started with the TTF spec and this doesn't
suggest that any glyphs are reserved.

Any help on this would very much be appreciated,

Mehdi

Re: CID Font Question

Posted by mehdi houshmand <me...@gmail.com>.
Excellent. I looked back and tracked it back to the same commit, and
as you can imagine, perplexing me further.

Well I'll post a fix soon.

Mehdi

On 18 March 2011 14:59, Jeremias Maerki <de...@jeremias-maerki.ch> wrote:
> Hi Mehdi
>
> Interesting problem. Apparently, the overwhelming majority of TrueType
> fonts map glyph index 1 to ".null" and glyph index 2 to "nonmarkingreturn"
> (for carriage returns and such). Your version of the Frutiger 45 Light
> font apparently doesn't but has "space" on glyph index 1.
>
> Those three "blind" indices have been in FOP's codebase since the
> addition of CID subsets in 2001:
> http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/org/apache/fop/render/pdf/fonts/MultiByteFont.java?r1=194167&r2=194168&pathrev=195822&
>
> Funny that something like this shows up after so much time.
>
> Anyway, since we only need to embed a .notdef as index 0 and the glyphs
> we really need I think it is safe to remove the fixed indices 1 and
> 2 since this whole thing then seems to work for both kinds of fonts.
>
> I'd say, put your patch in Bugzilla!
>
> On 17.03.2011 17:27:22 mehdi houshmand wrote:
>> Hi Guys,
>>
>> I found an issue with a True Type Font in PDF, I have attached a PDF
>> with the possible bug (buggy.pdf) and with my "fix" (fixed.pdf). The
>> issue is that if you copy/paste the text from the normal-weighted font
>> (top line) of the PDF, the " " (space) and "!" (exclamation mark)
>> characters are mapped to unicode index \uFFFF.
>>
>> Initially I thought this was a bug in the font, so I looked at the
>> "cmap" table in the font to see what unicode index these glyphs were
>> mapped to, and I found that they were the 2nd and 3rd entries in the
>> "cmap" table. This tickled my curiosity because all the fonts I
>> remember (and I checked a couple to be sure) have the first 3 glyphs
>> mapped to \u0000 or \uFFFF and in their CID is .notdef. The BOLD
>> version of the same font (in both PDFs) works fine, and as expected
>> the first 3 glyphs are mapped to \uFFFF, \u0000 and \u0000
>> respectively.
>>
>> I also checked the code-base and in o.a.f.fonts.CIDSubset has the
>> following lines of code:
>>
>>     /**
>>      * Adds the initial 3 glyphs which are the same for all CID subsets.
>>      */
>>     public void setupFirstThreeGlyphs() {
>>         // Make sure that the 3 first glyphs are included
>>         usedGlyphs.put(new Integer(0), new Integer(0));
>>         usedGlyphsIndex.put(new Integer(0), new Integer(0));
>>         usedGlyphsCount++;
>>         usedGlyphs.put(new Integer(1), new Integer(1));
>>         usedGlyphsIndex.put(new Integer(1), new Integer(1));
>>         usedGlyphsCount++;
>>         usedGlyphs.put(new Integer(2), new Integer(2));
>>         usedGlyphsIndex.put(new Integer(2), new Integer(2));
>>         usedGlyphsCount++;
>>     }
>>
>> So I checked the specification and no where does it suggest that the
>> first THREE are reserved, it does however say that CID 0 should be
>> .notdef. (see quote below, p340 of PDF spec).
>>
>> "Every CIDFont must contain a glyph description for CID 0, which is
>> analogous to the .notdef character name in simple fonts (see “Handling
>> Undefined Characters” on page 355)."
>>
>> My question is this, is this a FOP bug or is this a bug in the font
>> we're using? If it's a fop bug, I'd be more than happy to fix it
>> (delete the 6 lines and change the method name). If, however, it's a
>> font bug, then which spec should I be looking at? What is the bug? I
>> should also mention, that I started with the TTF spec and this doesn't
>> suggest that any glyphs are reserved.
>>
>> Any help on this would very much be appreciated,
>>
>> Mehdi
>
>
>
>
> Jeremias Maerki
>
>

Re: CID Font Question

Posted by Jeremias Maerki <de...@jeremias-maerki.ch>.
Hi Mehdi

Interesting problem. Apparently, the overwhelming majority of TrueType
fonts map glyph index 1 to ".null" and glyph index 2 to "nonmarkingreturn"
(for carriage returns and such). Your version of the Frutiger 45 Light
font apparently doesn't but has "space" on glyph index 1. 

Those three "blind" indices have been in FOP's codebase since the
addition of CID subsets in 2001:
http://svn.apache.org/viewvc/xmlgraphics/fop/trunk/src/org/apache/fop/render/pdf/fonts/MultiByteFont.java?r1=194167&r2=194168&pathrev=195822&

Funny that something like this shows up after so much time.

Anyway, since we only need to embed a .notdef as index 0 and the glyphs
we really need I think it is safe to remove the fixed indices 1 and
2 since this whole thing then seems to work for both kinds of fonts.

I'd say, put your patch in Bugzilla!

On 17.03.2011 17:27:22 mehdi houshmand wrote:
> Hi Guys,
> 
> I found an issue with a True Type Font in PDF, I have attached a PDF
> with the possible bug (buggy.pdf) and with my "fix" (fixed.pdf). The
> issue is that if you copy/paste the text from the normal-weighted font
> (top line) of the PDF, the " " (space) and "!" (exclamation mark)
> characters are mapped to unicode index \uFFFF.
> 
> Initially I thought this was a bug in the font, so I looked at the
> "cmap" table in the font to see what unicode index these glyphs were
> mapped to, and I found that they were the 2nd and 3rd entries in the
> "cmap" table. This tickled my curiosity because all the fonts I
> remember (and I checked a couple to be sure) have the first 3 glyphs
> mapped to \u0000 or \uFFFF and in their CID is .notdef. The BOLD
> version of the same font (in both PDFs) works fine, and as expected
> the first 3 glyphs are mapped to \uFFFF, \u0000 and \u0000
> respectively.
> 
> I also checked the code-base and in o.a.f.fonts.CIDSubset has the
> following lines of code:
> 
>     /**
>      * Adds the initial 3 glyphs which are the same for all CID subsets.
>      */
>     public void setupFirstThreeGlyphs() {
>         // Make sure that the 3 first glyphs are included
>         usedGlyphs.put(new Integer(0), new Integer(0));
>         usedGlyphsIndex.put(new Integer(0), new Integer(0));
>         usedGlyphsCount++;
>         usedGlyphs.put(new Integer(1), new Integer(1));
>         usedGlyphsIndex.put(new Integer(1), new Integer(1));
>         usedGlyphsCount++;
>         usedGlyphs.put(new Integer(2), new Integer(2));
>         usedGlyphsIndex.put(new Integer(2), new Integer(2));
>         usedGlyphsCount++;
>     }
> 
> So I checked the specification and no where does it suggest that the
> first THREE are reserved, it does however say that CID 0 should be
> .notdef. (see quote below, p340 of PDF spec).
> 
> "Every CIDFont must contain a glyph description for CID 0, which is
> analogous to the .notdef character name in simple fonts (see “Handling
> Undefined Characters” on page 355)."
> 
> My question is this, is this a FOP bug or is this a bug in the font
> we're using? If it's a fop bug, I'd be more than happy to fix it
> (delete the 6 lines and change the method name). If, however, it's a
> font bug, then which spec should I be looking at? What is the bug? I
> should also mention, that I started with the TTF spec and this doesn't
> suggest that any glyphs are reserved.
> 
> Any help on this would very much be appreciated,
> 
> Mehdi




Jeremias Maerki