You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by monejava <gi...@git.apache.org> on 2016/09/19 16:23:03 UTC

[GitHub] fop pull request #3: FOP-1969: Surrogate pairs not treated as single unicode...

GitHub user monejava opened a pull request:

    https://github.com/apache/fop/pull/3

    FOP-1969: Surrogate pairs not treated as single unicode codepoint for\u2026

    Implemented correct handling of surrogate pairs in ApacheFOP. The supported Renderes are PDF, PS and PNG. Tests implemented when it was possible. 
    
    Here a brief explanation of the design choice that I have made to modify the public API:
    
    `mapChar(char)`/`hasChar(char)`: are defined in `Typeface` which means that they have more then 20 implementations. Modify this interface would require lot of work and might introduce lot of bugs. That's why Glenn Adams (our contact in ApacheFOP project) asked us to create new methods rather the existing ones. In some of these implementations, such as `SingleByteFont`, is semantically correct to have a character represented by a single UTF-16 character. In some other implementation such as `CIDFont` (http://www.adobe.com/products/postscript/pdfs/cid.pdf) is not since they are meant to cover a wider range then 2^16 characters. 
    
    `mapCodePoint(int)`/`hasCodePoint(int)`: I have added these 2 methods to the `CIDFont` class that uses int (code points) instead of char so that we can cover the full Unicode range. As you can see from the `Typeface` hierarchy this change affect only 2 classes.
    
    `getUnicode()`: is defined in `CIDSet` (is not a property of the `Typeface` class or one of its subclasses). I changed the firm of this method to handle int instead of char because it is semantically incorrect to represent unicode with a single UTF-16 char. As you can see from the `CIDSet` hierarchy the change affect only 3 classes.
    
    `getUnicodeFromGID()`: this method is defined in `CustomFont` and `CIDSet`. It never get called from the `MultiByteFont` path, probably becuase getUnicode is used instead. That is why I'm down casting the return value from int to char in `CIDFull` and `CIDSubset`. Probably the best thing to do would be to get rid of this method or make it handle int, but again the change would affect more classes then the ones in our scope.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/monejava/fop surrogate_pairs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/fop/pull/3.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3
    
----
commit 111d6a6fa58c313293e9b79e245c8521778de2c8
Author: Rondelli <ro...@amazon.com>
Date:   2016-09-19T15:13:09Z

    FOP-1969: Surrogate pairs not treated as single unicode codepoint for display purposes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---