You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/01/03 19:56:01 UTC

[jira] [Comment Edited] (PDFBOX-1824) [PATCH] CFF fonts render wrong glyphs

    [ https://issues.apache.org/jira/browse/PDFBOX-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13861772#comment-13861772 ] 

John Hewson edited comment on PDFBOX-1824 at 1/3/14 6:55 PM:
-------------------------------------------------------------

The patch "bimbo_historia.patch" fixes the issue with the Bimbo Historia PDF file. The problem was an assumption made in the original CFFGlyph2D code that (Character Code => Glyph Name) mappings are 1:1 but they are actually Many:1


was (Author: jahewson):
This patch fixes the issue with the Bimbo Historia PDF file. The problem was an assumption made in the original CFFGlyph2D code that (Character Code => Glyph Name) mappings are 1:1 but they are actually Many:1

> [PATCH] CFF fonts render wrong glyphs
> -------------------------------------
>
>                 Key: PDFBOX-1824
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-1824
>             Project: PDFBox
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: John Hewson
>            Assignee: Andreas Lehmkühler
>              Labels: patch
>             Fix For: 2.0.0
>
>         Attachments: 1.patch, 2.patch, 3.patch, Bimbo_Historia_20070409_Esp.pdf-2-rev-1554775.png, Bimbo_Historia_20070409_Esp.pdf-2-rev-current.png, all.patch, bimbo_historia.patch, calluna-11.pdf, patched.jpg, trunk.jpg
>
>
> I've found three very closely related CFF encoding issues in v2.0.0 when using PDFToImage.
> Problem 1
> ---------
> Look a line 7 of the poem, it should be "And the mouldering dust that years have made"
> but instead says "Afld the fioulderiflg dust that years have fiade"
> The CFF font is asseumed to use CIDs but it does not if its not a ROS font.
> Therefore we add a check for CFF ROS class.
> Patch 1 fixes this.
> Problem 2
> ---------
> Look at line 3 "of right shoice" should be "of right choice".
> Likewise on line 2 of the 2nd paragraph "And a staunsh" should be "And a staunch",
> the st and ch ligatures are incorrect.
> This is because the font is an CFF ROS CID Font and the glyphs for the st and ch ligatures
> both have no name. The CFF format achieves this by using SIDs beyond the size of the string
> index, which map to .notdef. So there is a unique SID for each glyph, but not a unique name.
> Unfortuntely, PDFBox assumes that Type 1 fonts have glyphs with unique names, and this
> assumtion appears throughout the codebase. Because a glyph name and a SID perform essentially
> the same role, I recommend a simple solution to the problem: when an SID beyond the size of
> the string index is encounteted, instead of mapping it to .notdef it should be mapped to 
> a new name with the prefix "SID" for example mapping SID 409 to the name "SID409". That way
> each glyph will have a unique name, which is what PDFbox assumes.
> Patch 2 fixes this.
> Problem 3
> ---------
> Look at line 2, "That creepeth oÉer ruins old!" the word "o'er" is incorrectly rendered
> as "oÉer". This is because the Encoding entry in the PDF maps code 201 from "Eacute" in the
> base encoding to "quoteright", but this is being ignored by PDFBox.
> In the CFFGlyph2D constructor PDFBox examines the font's built-in charset. When the name
> "quoteright" is encountered it is looked up in the PDF Encoding (i.e. nameToCode) where
> it is changed to code 201. Thus code 201 is associated with the "quoteright" glyph in the
> codeToGlyph map. This is correct. 
> However, later when the "Eacute" glyph is encountered, its built-in charset code is also
> 201 (which is standard) and so the codeToGlyph map entry is overwritten, resulting in
> code 201 being associated with the "Eacute" glyph. 
> The solution is to build the codeToGlyph map in a strict order: first populate it with the
> font's built-in charset, then the PDF Encoding overwrites any entries which it defines.
> Patch 3 fixes this (and also replaces patch 2)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)