You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "John Hewson (JIRA)" <ji...@apache.org> on 2014/12/03 04:57:13 UTC

[jira] [Comment Edited] (PDFBOX-2524) [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages

    [ https://issues.apache.org/jira/browse/PDFBOX-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232539#comment-14232539 ] 

John Hewson edited comment on PDFBOX-2524 at 12/3/14 3:56 AM:
--------------------------------------------------------------

Thanks, yes you understood correctly. Here's my review:

- Your new PDType0Font constructor shouldn't call readEncoding() or fetchCMapUCS2(), as those methods are for reading a font from a PDF, not embedding a new font.
- getFontWidthsArray is parsing a string of space delimited integers, which you created in PDCIDFontType2Embedder#setItemsForCIDFont. These two methods should not be using strings to exchange data, what was your reason for doing this?
- CmapSubtable#getGlyphIdToCharacterCode() exposes private implementation details from CmapSubtable, however I'd recommend using CID = GID rather than your current approach, which would mean that you won't need this information anyway.
- Using CID = GID would make getCIDToGID redundant, and generate smaller PDF files because you can use the Identity cid2gid mapping.
- Please remove unused import statements
- Please do not import with .*


was (Author: jahewson):
Thanks, yes you understood correctly. Here's my review:

- Your new PDType0Font constructor shouldn't call readEncoding() or fetchCMapUCS2(), as those methods are for reading a font from a PDF, not embedding a new font.
- getFontWidthsArray is parsing a string a space delimited integers, which was created in PDCIDFontType2Embedder#setItemsForCIDFont. These two methods should not be using strings to exchange data, what was your reason for doing this?
- CmapSubtable#getGlyphIdToCharacterCode() exposes private implementation details from CmapSubtable, however I'd recommend using CID = GID rather than your current approach, which would mean that you won't need this information anyway.
- Using CID = GID would make getCIDToGID redundant, and generate smaller PDF files because you can use the Identity cid2gid mapping.
- Please remove unused import statements
- Please do not import with .*

> [PATCH] Two PDFont to create PDF documents in CJK and non-ISO-8859-1 languages
> ------------------------------------------------------------------------------
>
>                 Key: PDFBOX-2524
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-2524
>             Project: PDFBox
>          Issue Type: Improvement
>          Components: Writing
>    Affects Versions: 2.0.0
>            Reporter: Keiji Suzuki
>            Assignee: John Hewson
>         Attachments: Type0.java, Type0CJK.java, Type0Unicode.java, cidtype0.diff, two-new-fonts.diff
>
>
> I made two PDFont classes for creating PDF documents in CJK and non-ISO-8859-1 languages.
> One is PDType0CJKFont. This is for using CJK fonts included in the Asian font package of Adobe Reader. This font doesn't require the target font at the time of creating PDF documentary. This font uses UTF-16 as a text code and supports surrogate pair characters.
> The other is PDType0UnicodeFont. This is for using TrueType Type0 Font which can deal with any Unicode characters like a ArialUnicodeMS. Only the characters which are used actually in the document are embedde. Realizing this, you have to call the PDType0Unicode.reloadFont() method just before closing PDPageContentStream. I think this specification is ugly, but I could not thought of a suitable way to remove this spec. This font uses the original glyph code of the embedded font as a text code and supports surrogate pair characters too.
> Example programs using these two fonts are also attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)