You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tilman Hausherr (Jira)" <ji...@apache.org> on 2021/11/20 12:07:00 UTC
[jira] [Created] (PDFBOX-5328) Failing to get multiple encodings from cmap table
Tilman Hausherr created PDFBOX-5328:
---------------------------------------
Summary: Failing to get multiple encodings from cmap table
Key: PDFBOX-5328
URL: https://issues.apache.org/jira/browse/PDFBOX-5328
Project: PDFBox
Issue Type: Bug
Components: FontBox
Affects Versions: 2.0.24, 1.8.16
Reporter: Tilman Hausherr
Assignee: Tilman Hausherr
Fix For: 1.8.17, 2.0.25, 3.0.0 PDFBox
Attachments: NotoSansSC-Regular.otf
As reported by Ty Lewis in the users mailing list, see [here|https://mail-archives.apache.org/mod_mbox/pdfbox-users/202111.mbox/%3CCAPRgSAOG1a9kw4wSmArH0uG-N5xd9_kPq7ju4U%3DSv9H9CQZmcQ%40mail.gmail.com%3E]
{noformat}
Unicode encodings for GID 8712: List(U+f967)
Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 3):
List(U+4e0d, U+f967)
Unicode encodings for GID 8712 from table (platformId = 0 encodingId = 4):
List(U+f967)
{noformat}
I made some java code to reproduce this:
{code}
File fontFile = new File("NotoSansSC-Regular.otf");
OTFParser otfParser = new OTFParser(false);
OpenTypeFont otf = otfParser.parse(fontFile);
CmapLookup unicodeCmapLookup = otf.getUnicodeCmapLookup();
List<Integer> charCodes = unicodeCmapLookup.getCharCodes(8712);
System.out.println(charCodes);
CmapTable cmapTable = otf.getCmap();
CmapSubtable unicodeFullCmapTable = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, CmapTable.ENCODING_UNICODE_2_0_FULL);
CmapSubtable unicodeBmpCmapTable = cmapTable.getSubtable(CmapTable.PLATFORM_UNICODE, CmapTable.ENCODING_UNICODE_2_0_BMP);
List<Integer> unicodeBmpCharCodes = unicodeBmpCmapTable.getCharCodes(8712);
List<Integer> unicodeFullCharCodes = unicodeFullCmapTable.getCharCodes(8712);
System.out.println(unicodeBmpCharCodes);
System.out.println(unicodeFullCharCodes);
{code}
A look in the tables with DTL OTMaster 3.7 light shows there are indeed two entries. Its 不 and 不.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org