You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-dev@xmlgraphics.apache.org by "Peter B. West" <li...@pbw.id.au> on 2010/05/21 15:31:42 UTC

Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java DoubleByteFont.java FopCharacterSet.java RasterFont.j

I'm puzzled by this discussion. AFAIK, Java has rejected moving to 32 bits in Java 5. Instead, they are supporting supplementary characters. There's a discussion here: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/

Peter West
"Lord, to whom shall we go?"




On 21/05/2010, at 11:11 PM, Glenn Adams wrote:

> I concur with this change, and have already made some changes in this direction in my work on adding complex script support.
> 
> Please note that it is not quite so simple as merely changing from char to int in some locations. It is also necessary to convert from UTF-16 to UTF-32, i.e., to the full Unicode code point value, which can range from 0x000000 through 0x10FFFF (see Unicode 5.2, Section 3.3, Item D9). It is probably not a good idea to make this conversion too early, but rather, to defer it until certain well defined interface points, which need to be documented as taking the full Unicode code point, and not merely a UTF-16 code element.
> 
> On Fri, May 21, 2010 at 3:46 AM, Vincent Hennebert <vh...@gmail.com> wrote:
> Hi,
> 
> > Author: jeremias
> > Date: Thu May 20 09:52:27 2010
> > New Revision: 946585
> >
> > URL: http://svn.apache.org/viewvc?rev=946585&view=rev
> > Log:
> > Changed many variables and parameters from "int" to "char" because AFP font support mostly uses Unicode code points unlike Type 1 and TrueType support which use internal character code points (the result of Font.mapChar()). This should improve code readability.
> 
> Not sure this is a desirable change. char can only address characters
> from the Basic Multilingual Plane. Java 1.5 have started to use int to
> overcome that issue actually. So unless there is a fundamental
> limitation in AFP such that characters beyond the BMP will never be
> usable, I think we want to stick to int.
> 
> <snip/>
> 
> Vincent
> 


Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java DoubleByteFont.java FopCharacterSet.java RasterFont.j

Posted by "Peter B. West" <li...@pbw.id.au>.
Sorry, I wasn't paying enough attention.

Yes, when dealing with individual character interfaces, you need to provide codepoint as well as char. The relationship between codepoints and strings is not straightforward, however.

Peter West
"Lord, to whom shall we go?"




On 22/05/2010, at 12:14 AM, Glenn Adams wrote:

> it's a simple problem, which can be stated as follows:
> 	• the "char" data type in Java does not denote a character, rather, it denotes a UTF-16 encoding element
> 	• some Unicode characters, i.e., those in the BMP, are represented by one char element (char[1]), while other Unicode characters require two char elements (char[2]);
> 	• in order to make use of non-BMP characters, of which there are now many standardized instances, one must either pass a char array, e.g., char[2], or, alternatively pass an int, which is capable of representing all Unicode code points in the range of 0 ... 0x10FFFF;
> at some point, FOP needs to support the effective use of characters outside the BMP coding space, and, consequently, those FOP interfaces that use the char type need to be upgraded to int;
> 
> I am referring to FOP defined interfaces mind you, not Java defined interfaces; in general, the Java interfaces provide mechanisms to address this problem; for instance, see the discussion in the preamble of the definition of java.lang.Character, the pertinent point of which I repeat below:
> 	• The methods that only accept a char value cannot support supplementary characters. They treat char values from the surrogate ranges as undefined characters. For example, Character.isLetter('\uD840') returns false, even though this specific value if followed by any low-surrogate value in a string would represent a letter.
> 	• The methods that accept an int value support all Unicode characters, including supplementary characters. For example, Character.isLetter(0x2F81A) returns truebecause the code point value represents a letter (a CJK ideograph).
> what I believe the original commenter is pointing out (and that I am agreeing with) is that FOP needs to take care to not use the char type for interface parameters that are intended to denote a Unicode character; or, if they do, then an overloaded version of the same interface that uses the int type should also be provided;
> 
> for example, the following interfaces need to be upgraded to int or to have an overloaded int variant:
> 
> org.apache.fop.fonts.Font.getKernValue(char ch1, char ch2);
> org.apache.fop.fonts.Font.getWidth(char charnum);
> org.apache.fop.fonts.Font.mapChar(char c);
> org.apache.fop.fonts.Font.hasChar(char c);
> org.apache.fop.fo.CharIterator.replaceChar(char c);
> org.apache.fop.fo.flow.Character.getCharacter();
> org.apache.fop.util.CharUtilities.*;
> ...
> 
> i have already upgraded al of the CharUtilities.* methods to use int instead of char in my present work on complex script support, but there are a variety of other internal interfaces as noted above that need to be upgraded as well. if you like, I can fold this into my present work, or assign it a new bug number (which may be the best for tracking purposes);
> 
> regards,
> glenn
> 
> 
> On Fri, May 21, 2010 at 7:31 AM, Peter B. West <li...@pbw.id.au> wrote:
> I'm puzzled by this discussion. AFAIK, Java has rejected moving to 32 bits in Java 5. Instead, they are supporting supplementary characters. There's a discussion here: http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
> 
> Peter West
> "Lord, to whom shall we go?"
> 
> 
> 
> 
> On 21/05/2010, at 11:11 PM, Glenn Adams wrote:
> 
> > I concur with this change, and have already made some changes in this direction in my work on adding complex script support.
> >
> > Please note that it is not quite so simple as merely changing from char to int in some locations. It is also necessary to convert from UTF-16 to UTF-32, i.e., to the full Unicode code point value, which can range from 0x000000 through 0x10FFFF (see Unicode 5.2, Section 3.3, Item D9). It is probably not a good idea to make this conversion too early, but rather, to defer it until certain well defined interface points, which need to be documented as taking the full Unicode code point, and not merely a UTF-16 code element.
> >
> > On Fri, May 21, 2010 at 3:46 AM, Vincent Hennebert <vh...@gmail.com> wrote:
> > Hi,
> >
> > > Author: jeremias
> > > Date: Thu May 20 09:52:27 2010
> > > New Revision: 946585
> > >
> > > URL: http://svn.apache.org/viewvc?rev=946585&view=rev
> > > Log:
> > > Changed many variables and parameters from "int" to "char" because AFP font support mostly uses Unicode code points unlike Type 1 and TrueType support which use internal character code points (the result of Font.mapChar()). This should improve code readability.
> >
> > Not sure this is a desirable change. char can only address characters
> > from the Basic Multilingual Plane. Java 1.5 have started to use int to
> > overcome that issue actually. So unless there is a fundamental
> > limitation in AFP such that characters beyond the BMP will never be
> > usable, I think we want to stick to int.
> >
> > <snip/>
> >
> > Vincent
> >
> 
> 


Re: svn commit: r946585 - in /xmlgraphics/fop/trunk/src/java/org/apache/fop/afp/fonts: AFPFont.java AbstractOutlineFont.java CharacterSet.java CharacterSetBuilder.java CharacterSetOrientation.java DoubleByteFont.java FopCharacterSet.java RasterFont.j

Posted by Glenn Adams <gl...@skynav.com>.
it's a simple problem, which can be stated as follows:

   - the "char" data type in Java does not denote a character, rather, it
   denotes a UTF-16 encoding element
   - some Unicode characters, i.e., those in the BMP, are represented by one
   char element (char[1]), while other Unicode characters require two char
   elements (char[2]);
   - in order to make use of non-BMP characters, of which there are now many
   standardized instances, one must either pass a char array, e.g., char[2],
   or, alternatively pass an int, which is capable of representing all Unicode
   code points in the range of 0 ... 0x10FFFF;

at some point, FOP needs to support the effective use of characters outside
the BMP coding space, and, consequently, those FOP interfaces that use the
char type need to be upgraded to int;

I am referring to FOP defined interfaces mind you, not Java defined
interfaces; in general, the Java interfaces provide mechanisms to address
this problem; for instance, see the discussion in the preamble of the
definition of java.lang.Character<http://java.sun.com/javase/6/docs/api/java/lang/Character.html>,
the pertinent point of which I repeat below:

   - The methods that only accept a char value cannot support supplementary
   characters. They treat char values from the surrogate ranges as undefined
   characters. For example, Character.isLetter('\uD840') returns false, even
   though this specific value if followed by any low-surrogate value in a
   string would represent a letter.
   - The methods that accept an int value support all Unicode characters,
   including supplementary characters. For example,
   Character.isLetter(0x2F81A) returns truebecause the code point value
   represents a letter (a CJK ideograph).

what I believe the original commenter is pointing out (and that I am
agreeing with) is that FOP needs to take care to not use the *char* type for
interface parameters that are intended to denote a Unicode character; or, if
they do, then an overloaded version of the same interface that uses the *int
* type should also be provided;

for example, the following interfaces need to be upgraded to int or to have
an overloaded int variant:

org.apache.fop.fonts.Font.getKernValue(char ch1, char ch2);
org.apache.fop.fonts.Font.getWidth(char charnum);
org.apache.fop.fonts.Font.mapChar(char c);
org.apache.fop.fonts.Font.hasChar(char c);
org.apache.fop.fo.CharIterator.replaceChar(char c);
org.apache.fop.fo.flow.Character.getCharacter();
org.apache.fop.util.CharUtilities.*;
...

i have already upgraded al of the CharUtilities.* methods to use int instead
of char in my present work on complex script support, but there are a
variety of other internal interfaces as noted above that need to be upgraded
as well. if you like, I can fold this into my present work, or assign it a
new bug number (which may be the best for tracking purposes);

regards,
glenn


On Fri, May 21, 2010 at 7:31 AM, Peter B. West <li...@pbw.id.au> wrote:

> I'm puzzled by this discussion. AFAIK, Java has rejected moving to 32 bits
> in Java 5. Instead, they are supporting supplementary characters. There's a
> discussion here:
> http://java.sun.com/developer/technicalArticles/Intl/Supplementary/
>
> Peter West
> "Lord, to whom shall we go?"
>
>
>
>
> On 21/05/2010, at 11:11 PM, Glenn Adams wrote:
>
> > I concur with this change, and have already made some changes in this
> direction in my work on adding complex script support.
> >
> > Please note that it is not quite so simple as merely changing from char
> to int in some locations. It is also necessary to convert from UTF-16 to
> UTF-32, i.e., to the full Unicode code point value, which can range from
> 0x000000 through 0x10FFFF (see Unicode 5.2, Section 3.3, Item D9). It is
> probably not a good idea to make this conversion too early, but rather, to
> defer it until certain well defined interface points, which need to be
> documented as taking the full Unicode code point, and not merely a UTF-16
> code element.
> >
> > On Fri, May 21, 2010 at 3:46 AM, Vincent Hennebert <vh...@gmail.com>
> wrote:
> > Hi,
> >
> > > Author: jeremias
> > > Date: Thu May 20 09:52:27 2010
> > > New Revision: 946585
> > >
> > > URL: http://svn.apache.org/viewvc?rev=946585&view=rev
> > > Log:
> > > Changed many variables and parameters from "int" to "char" because AFP
> font support mostly uses Unicode code points unlike Type 1 and TrueType
> support which use internal character code points (the result of
> Font.mapChar()). This should improve code readability.
> >
> > Not sure this is a desirable change. char can only address characters
> > from the Basic Multilingual Plane. Java 1.5 have started to use int to
> > overcome that issue actually. So unless there is a fundamental
> > limitation in AFP such that characters beyond the BMP will never be
> > usable, I think we want to stick to int.
> >
> > <snip/>
> >
> > Vincent
> >
>
>