You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@pdfbox.apache.org by "Andreas Lehmkühler (JIRA)" <ji...@apache.org> on 2010/09/05 12:51:32 UTC

[jira] Resolved: (PDFBOX-11) CID to Unicode mapping

     [ https://issues.apache.org/jira/browse/PDFBOX-11?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andreas Lehmkühler resolved PDFBOX-11.
--------------------------------------

    Fix Version/s: 1.3.0
       Resolution: Fixed

Version 992756 added support for cidchar and improved the support for cidrange

> CID to Unicode mapping
> ----------------------
>
>                 Key: PDFBOX-11
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-11
>             Project: PDFBox
>          Issue Type: New Feature
>          Components: Text extraction
>             Fix For: 1.3.0
>
>
> [imported from SourceForge]
> http://sourceforge.net/tracker/index.php?group_id=78314&atid=552835&aid=931628
> Originally submitted by vadimbit on 2004-04-08 02:45.
> For extracting CJK text it would be usefull to map CID-
> keyed cheracters to Unicode. 
> For example, "90ms-RKSJ-UCS2" cmap file can be use 
> for retrieving unicodes for "90ms-RKSJ-H" and "90ms-
> RKSJ-V" encoding of CID-fonts.
> Now CMapParser parse "bfrange" and "bfchar". If is 
> enough for parsing ToUnicode CMap files. 
> So, as I understand, "encoding name to ToUnicode 
> CMap file name" mapping is needed only.
> [comment on SourceForge]
> Originally sent by vadimbit.
> Logged In: YES 
> user_id=958555
> Some additional information...
> There is better way to retrieve unicode symbol:
> We should get CID, using natural CMap file and map it to 
> Unicode, using appropriate Uni* CMap file (backward 
> mapping)...
> So, "cidrange" and "cidchar" should be parsered too...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.