You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2016/01/26 19:46:40 UTC

[jira] [Comment Edited] (PDFBOX-3092) Format 4 TTF cmap table is parsed incorrectly

    [ https://issues.apache.org/jira/browse/PDFBOX-3092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15117728#comment-15117728 ] 

John Hewson edited comment on PDFBOX-3092 at 1/26/16 6:46 PM:
--------------------------------------------------------------

A cmap table doesn't need to define mappings for all glyphs, many glyphs have no code points, e.g composite glyphs (such as accents) and contextually substituted glyphs (from GSUB).

Arial Unicode is certainly not a broken font! Microsoft are responsible for most of the TrueType / OTL spec and it's a flagship font on Windows.

The cmap table in Arial Unicode contains 2496 entries, FontBox returns approx. 100 or so, so we're missing about 95% of the cmap entries in PDFBox, which is why PDFBox fails to render glyphs that we know exist in the cmap table and the font.


was (Author: jahewson):
A cmap table doesn't need to define mappings for all glyphs, many glyphs have no code points, e.g composite glyphs (such as accents) and contextually substituted glyphs (from GSUB).

Arial Unicode is certainly not a broken font! Microsoft are responsible for most of the TrueType / OTL spec and it's a flagship font on Windows.

The cmap table in Arial Unicode contains 2496 entries, FontBox returns approx. 100 or so, so we're missing about 95% of the cmap entries in PDFBox.

> Format 4 TTF cmap table is parsed incorrectly
> ---------------------------------------------
>
>                 Key: PDFBOX-3092
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-3092
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>             Fix For: 2.1.0
>
>
> Certain large Format 4 cmap tables aren't being parsed correctly by CmapSubtable#processSubtype4(), for example in the font "ArialUnicodeMS".
> This results in missing glyphs when rendering the file from PDFBOX-2950, when "ArialUnicodeMS" is used as a substitute. You can force this to happen by changing the following line of PDCIDFontType2:
> {code}
> // find font or substitute
> CIDFontMapping mapping = FontMappers.instance()
>                                     .getCIDFont(getBaseFont(), getFontDescriptor(),
>                                                 getCIDSystemInfo());
> {code}
> Replace getBaseFont() with "ArialUnicodeMS"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org