You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "ASF subversion and git services (Jira)" <ji...@apache.org> on 2021/11/20 13:03:00 UTC
[jira] [Commented] (PDFBOX-5328) Failing to get multiple encodings from cmap table

    [ https://issues.apache.org/jira/browse/PDFBOX-5328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17446820#comment-17446820 ] 

ASF subversion and git services commented on PDFBOX-5328:
---------------------------------------------------------

Commit 1895198 from Tilman Hausherr in branch 'pdfbox/branches/2.0'
[ https://svn.apache.org/r1895198 ]

PDFBOX-5328: load test file

> Failing to get multiple encodings from cmap table
> -------------------------------------------------
>
>                 Key: PDFBOX-5328
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5328
>             Project: PDFBox
>          Issue Type: Bug
>          Components: FontBox
>    Affects Versions: 1.8.16, 2.0.24
>            Reporter: Tilman Hausherr
>            Assignee: Tilman Hausherr
>            Priority: Minor
>             Fix For: 2.0.25, 3.0.0 PDFBox
>
>         Attachments: NotoSansSC-Regular.otf
>
>
> As reported by Ty Lewis in the users mailing list, see [here|https://mail-archives.apache.org/mod_mbox/pdfbox-users/202111.mbox/%3CCAPRgSAOG1a9kw4wSmArH0uG-N5xd9_kPq7ju4U%3DSv9H9CQZmcQ%40mail.gmail.com%3E]
> {noformat}
> Unicode encodings for GID 8712: List(U+f967)
> Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 3):
> List(U+4e0d, U+f967)
> Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 4):
> List(U+f967)
> {noformat}
> I made some java code to reproduce this:
> {code}
> File fontFile = new File("NotoSansSC-Regular.otf");
> OTFParser otfParser = new OTFParser(false);
> OpenTypeFont otf = otfParser.parse(fontFile);
> CmapLookup unicodeCmapLookup = otf.getUnicodeCmapLookup();
> List<Integer> charCodes = unicodeCmapLookup.getCharCodes(8712);
> System.out.println(charCodes);
> CmapTable cmapTable = otf.getCmap();
> CmapSubtable unicodeFullCmapTable = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, CmapTable.ENCODING_UNICODE_2_0_FULL);
> CmapSubtable unicodeBmpCmapTable = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, CmapTable.ENCODING_UNICODE_2_0_BMP);
> List<Integer> unicodeBmpCharCodes = unicodeBmpCmapTable.getCharCodes(8712);
> List<Integer> unicodeFullCharCodes = unicodeFullCmapTable.getCharCodes(8712);
> System.out.println(unicodeBmpCharCodes);
> System.out.println(unicodeFullCharCodes);
> {code}
> A look in the tables with DTL OTMaster 3.7 light shows there are indeed two entries. A search for them (in hex) shows the characters 不 and 不.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org