You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by Tony Wu <wu...@gmail.com> on 2006/10/17 08:49:47 UTC

[classlib][luni][charset]Strange behavior of UnicodeBig

Hi all,
I found this when I tried to debug the failure tests of ant on
harmony. Note the output of testcases below.

import java.io.UnsupportedEncodingException;
import java.nio.charset.Charset;
import junit.framework.TestCase;

public class TestCharset extends TestCase {
    public void test1() throws UnsupportedEncodingException {
        byte[] b = new byte[] { 'a', 'b', 'c' };
        String s = new String(b, "UnicodeBig");
        assertEquals("abc", s);
    }

    public void test2() {
        Charset.forName("UnicodeBig");
    }
}

RI:
test1: junit.framework.ComparisonFailure: expected:<abc> but was:<>
test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig

Harmony:
test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
test2:
java.nio.charset.UnsupportedCharsetException: The unsupported charset
name is "UnicodeBig"

seems RI can recognize the *UnicodeBig* in Constructor of j.l.String,
whereas Harmony does not support this alias at all.

Do you have any concern about that?
-- 
Tony Wu
China Software Development Lab, IBM

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
>
> Thank you Andrew,
> I think I got the point. The j.l.String of RI uses the encoding of IO
> whereas Charset.forName use another of NIO.


exactly!

And the new problem is shall we follow the spec[1] to support the two
> suites of charset implemetation? I just have a look and find we does
> not support some Canonical Name for java.io and java.lang API such as
> UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.


I think we have no choice because spec has explictly pointed out the basic
charset name for java.io,java.lang and nio, which includes "UnicodeBig". So
the problem left is how, not whether. :-)

Mapping may solve this problem. We may map:
1. io/lang -> nio
2. nio -> io/lang
3. io/lang/nio -> icu

BTW, does current nio.charset implementation support "UnicodeBig"? There're
a little differences between "UnicodeBig" and "UTF-16 BE":
UnicodeBig: Sixteen-bit Unicode Transformation Format, big-endian byte
order, with byte-order mark
UTF-16 BE: Sixteen-bit Unicode Transformation Format, big-endian byte order

[1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
>
> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > >
> > >
> > >
> > > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > >
> > > > I think Harmony is more reasonable.
> > > >
> > > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > > .UnsupportedCharsetException then no support for the named charset
> is
> > > > available in this instance of the Java virtual machine. Then how can
> we
> > > > get
> > > > new String(b, "UnicodeBig") without throwing
> UnsupportedCharsetException
> > > > on
> > > > the same jvm? The spec for String(byte[] bytes,String charsetName)
> also
> > > > says
> > > > if the named charset is not supported, UnsupportedCharsetException
> > > > should be
> > > > thrown out.
> > >
> > >
> > > UNICODEBIG is a java alias for UTF-16BE. I think we'd better support
> such
> > > mapping in String and follow RI.
> > >
> >
> > You can find the encoding set from spec. [1]
> >
> > [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> >
> >  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > >
> > > > > Hi all,
> > > > > I found this when I tried to debug the failure tests of ant on
> > > > > harmony. Note the output of testcases below.
> > > > >
> > > > > import java.io.UnsupportedEncodingException;
> > > > > import java.nio.charset.Charset ;
> > > > > import junit.framework.TestCase;
> > > > >
> > > > > public class TestCharset extends TestCase {
> > > > >    public void test1() throws UnsupportedEncodingException {
> > > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > >        String s = new String(b, "UnicodeBig");
> > > > >        assertEquals("abc", s);
> > > > >    }
> > > > >
> > > > >    public void test2() {
> > > > >        Charset.forName("UnicodeBig");
> > > > >    }
> > > > > }
> > > > >
> > > > > RI:
> > > > > test1: junit.framework.ComparisonFailure: expected:<abc> but
> was:<>
> > > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > > >
> > > > > Harmony:
> > > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > > > test2:
> > > > > java.nio.charset.UnsupportedCharsetException: The unsupported
> charset
> > > > > name is "UnicodeBig"
> > > > >
> > > > > seems RI can recognize the *UnicodeBig* in Constructor of
> j.l.String,
> > > > > whereas Harmony does not support this alias at all.
> > > > >
> > > > > Do you have any concern about that?
> > > > > --
> > > > > Tony Wu
> > > > > China Software Development Lab, IBM
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Leo Li
> > > > China Software Development Lab, IBM
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew Zhang
> >
> >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 11/7/06, Tony Wu <wu...@gmail.com> wrote:
>
> Different with RI, our io/lang use the same charsets
> implementation(ICU) as nio. You know, it is not recommend to modify
> ICU's code. To fix this problem under the precondition I mentioned, I
> have to write a BOM before every encoding operation and handle BOM
> before every decoding, It will obviously broke the structure of our
> existing io/lang implementation.
> So, I think supplying a harmony SPI is easier and more clear.


Make sense. :)

On 11/7/06, Andrew Zhang <zh...@gmail.com> wrote:
> > On 11/6/06, Tony Wu <wu...@gmail.com> wrote:
> > >
> > > A bad news, ICU team refused to support UnicodeBig because it is not
> > > available in nio.
> > >
> > > A good news is that I realize there is a smooth way to support these
> > > charsets. I tried to implement a SPI to accept the name "UnicodeBig"
> > > and it worked. We could support any other charsets and fix the bug
> > > which ICU team hesitated to do this way.  I think it also brings us
> > > the extensibility, do you have any concern about implementing a
> > > harmony SPI? I'll go on if no one objects.
> >
> >
> > Hey Tony, if we only consider io/lang to support UnicodeBig, will the
> thing
> > be simpler?
> >
> > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > >
> > > > > I think to support UnicodeBig in nio is not a bug but a feature.
> And
> > > > > the key point is how can I get UnicodeBig supportted in IO/Lang?
> > > >
> > > >
> > > > If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support
> > > "UnicodeBig"  as
> > > > well?
> > > >
> > > > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > >
> > > > > > > The implemetion is from ICU, so, I think we'd better not to
> wrap
> > > it by
> > > > > > > ourselves. I'll post to ICU mailing list and ask if they can
> help
> > > to
> > > > > > > supply these legacy charsets.
> > > > > >
> > > > > >
> > > > > > Hey Tony, please keep in mind that following code[1] should
> print
> > > false
> > > > > and
> > > > > > throw an UnsupportedCharsetException. If ICU provides
> "UnicodeBig"
> > > > > support,
> > > > > > does it mean harmony nio also support "UnicodeBig"?
> > > > > >
> > > > > > [1]
> > > > > > System.out.println(Charset.isSupported("UnicodeBig"));
> > > > > > Charset.forName("UncodeBig");
> > > > > >
> > > > > > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Thank you all,
> > > > > > > > > It is not just an issue about name.
> > > > > > > > > The precondition of mapping is that ICU has really
> supported
> > > this
> > > > > > > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer
> to
> > > [1].
> > > > > > > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as
> work
> > > > > around[2]?
> > > > > > > >
> > > > > > > >
> > > > > > > > No, I don't think so. The only difference between
> "UnicodeBig"
> > > and
> > > > > > > > "UTF-16BE" is with/without byte-order mark. So it should be
> easy
> > > to
> > > > > wrap
> > > > > > > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put
> 0xFE
> > > > > 0xFF at
> > > > > > > the
> > > > > > > > begining of the bytes and then encode the buffer as
> "UTF-16BE".
> > > Do I
> > > > > > > miss
> > > > > > > > something?
> > > > > > > >
> > > > > > > > [1]http://dev.icu-
> > > > > > > > >
> > > > > > >
> > > > >
> > >
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > > > > > > >
> > > > > > > > > [2]
> > > > > > > > > UTF-16
> > > > > > > > > Sixteen-bit UCS Transformation Format, byte order
> identified
> > > by an
> > > > > > > > > optional byte-order mark
> > > > > > > > > UnicodeBig
> > > > > > > > > Sixteen-bit Unicode Transformation Format, big-endian byte
> > > order,
> > > > > > > > > with byte-order mark
> > > > > > > > > UnicodeLittle
> > > > > > > > > Sixteen-bit Unicode Transformation Format, little-endian
> byte
> > > > > order,
> > > > > > > > > with byte-order mark
> > > > > > > > >
> > > > > > > > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > > > > > > > Tony Wu wrote:
> > > > > > > > > > > Thank you Andrew,
> > > > > > > > > > > I think I got the point. The j.l.String of RI uses the
> > > > > encoding of
> > > > > > > IO
> > > > > > > > > > > whereas Charset.forName use another of NIO.
> > > > > > > > > > >
> > > > > > > > > > > And the new problem is shall we follow the spec[1] to
> > > support
> > > > > the
> > > > > > > two
> > > > > > > > > > > suites of charset implemetation? I just have a look
> and
> > > find
> > > > > we
> > > > > > > does
> > > > > > > > > > > not support some Canonical Name for java.io and
> java.langAPI
> > > > > such
> > > > > > > as
> > > > > > > > > > >
> > > > > > >
> > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > > > > > > There is such a charset name mapping in
> InputStreamReader, I
> > > > > think
> > > > > > > we
> > > > > > > > > > have no choice but to support these legacy charset
> names,
> > > you
> > > > > may
> > > > > > > need
> > > > > > > > > > some refactory work to make these classes use the same
> > > mapping
> > > > > data.
> > > > > > > > > > >
> > > > > > > > > > > [1]
> > > > > > >
> http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > > > >
> > > > > > > > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com>
> wrote:
> > > > > > > > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com>
> > > wrote:
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I think Harmony is more reasonable.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > As spec says, if  Charset.forName("UnicodeBig")
> > > throws
> > > > > > > > > > >> > > .UnsupportedCharsetException then no support for
> the
> > > > > named
> > > > > > > > > > >> charset is
> > > > > > > > > > >> > > available in this instance of the Java virtual
> > > machine.
> > > > > Then
> > > > > > > how
> > > > > > > > > > >> can we
> > > > > > > > > > >> > > get
> > > > > > > > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > > > > > > > >> UnsupportedCharsetException
> > > > > > > > > > >> > > on
> > > > > > > > > > >> > > the same jvm? The spec for String(byte[]
> bytes,String
> > > > > > > > > > >> charsetName) also
> > > > > > > > > > >> > > says
> > > > > > > > > > >> > > if the named charset is not supported,
> > > > > > > > > UnsupportedCharsetException
> > > > > > > > > > >> > > should be
> > > > > > > > > > >> > > thrown out.
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think
> we'd
> > > > > better
> > > > > > > > > > >> support such
> > > > > > > > > > >> > mapping in String and follow RI.
> > > > > > > > > > >> >
> > > > > > > > > > >>
> > > > > > > > > > >> You can find the encoding set from spec. [1]
> > > > > > > > > > >>
> > > > > > > > > > >> [1]
> > > > > > >
> http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > > > >>
> > > > > > > > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Hi all,
> > > > > > > > > > >> > > > I found this when I tried to debug the failure
> > > tests of
> > > > > ant
> > > > > > > on
> > > > > > > > > > >> > > > harmony. Note the output of testcases below.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > > > > > > >> > > > import java.nio.charset.Charset ;
> > > > > > > > > > >> > > > import junit.framework.TestCase;
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > public class TestCharset extends TestCase {
> > > > > > > > > > >> > > >    public void test1() throws
> > > > > UnsupportedEncodingException
> > > > > > > {
> > > > > > > > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > > > > > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > > > > > > > >> > > >        assertEquals("abc", s);
> > > > > > > > > > >> > > >    }
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >    public void test2() {
> > > > > > > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > > > > > > >> > > >    }
> > > > > > > > > > >> > > > }
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > RI:
> > > > > > > > > > >> > > > test1: junit.framework.ComparisonFailure:
> > > > > expected:<abc>
> > > > > > > but
> > > > > > > > > > >> was:<>
> > > > > > > > > > >> > > > test2:
> java.nio.charset.UnsupportedCharsetException
> > > :
> > > > > > > UnicodeBig
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Harmony:
> > > > > > > > > > >> > > > test1:
> java.nio.charset.UnsupportedCharsetException:
> > > > > > > UnicodeBig
> > > > > > > > > > >> > > > test2:
> > > > > > > > > > >> > > > java.nio.charset.UnsupportedCharsetException:
> The
> > > > > > > unsupported
> > > > > > > > > > >> charset
> > > > > > > > > > >> > > > name is "UnicodeBig"
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > seems RI can recognize the *UnicodeBig* in
> > > Constructor
> > > > > of
> > > > > > > > > > >> j.l.String,
> > > > > > > > > > >> > > > whereas Harmony does not support this alias at
> all.
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > > Do you have any concern about that?
> > > > > > > > > > >> > > > --
> > > > > > > > > > >> > > > Tony Wu
> > > > > > > > > > >> > > > China Software Development Lab, IBM
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >>
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > >> > > > Terms of use :
> > > > > > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > > >> > > > To unsubscribe, e-mail:
> > > > > > > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > > >> > > > For additional commands, e-mail:
> > > > > > > > > > >> harmony-dev-help@incubator.apache.org
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > --
> > > > > > > > > > >> > > Leo Li
> > > > > > > > > > >> > > China Software Development Lab, IBM
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> >
> > > > > > > > > > >> >
> > > > > > > > > > >> > --
> > > > > > > > > > >> > Best regards,
> > > > > > > > > > >> > Andrew Zhang
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> --
> > > > > > > > > > >> Best regards,
> > > > > > > > > > >> Andrew Zhang
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Paulex Yang
> > > > > > > > > > China Software Development Lab
> > > > > > > > > > IBM
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > > > > Terms of use :
> > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > > To unsubscribe, e-mail:
> > > > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > > For additional commands, e-mail:
> > > > > > > harmony-dev-help@incubator.apache.org
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Tony Wu
> > > > > > > > > China Software Development Lab, IBM
> > > > > > > > >
> > > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > > Terms of use :
> > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > To unsubscribe, e-mail:
> > > > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > For additional commands, e-mail:
> > > > > harmony-dev-help@incubator.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Best regards,
> > > > > > > > Andrew Zhang
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Tony Wu
> > > > > > > China Software Development Lab, IBM
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > Terms of use :
> http://incubator.apache.org/harmony/mailing.html
> > > > > > > To unsubscribe, e-mail:
> > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > For additional commands, e-mail:
> > > harmony-dev-help@incubator.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew Zhang
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tony Wu
> > > > > China Software Development Lab, IBM
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew Zhang
> > > >
> > > >
> > >
> > >
> > > --
> > > Tony Wu
> > > China Software Development Lab, IBM
> > >
> >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>



-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Tony Wu <wu...@gmail.com>.
Different with RI, our io/lang use the same charsets
implementation(ICU) as nio. You know, it is not recommend to modify
ICU's code. To fix this problem under the precondition I mentioned, I
have to write a BOM before every encoding operation and handle BOM
before every decoding, It will obviously broke the structure of our
existing io/lang implementation.
So, I think supplying a harmony SPI is easier and more clear.

On 11/7/06, Andrew Zhang <zh...@gmail.com> wrote:
> On 11/6/06, Tony Wu <wu...@gmail.com> wrote:
> >
> > A bad news, ICU team refused to support UnicodeBig because it is not
> > available in nio.
> >
> > A good news is that I realize there is a smooth way to support these
> > charsets. I tried to implement a SPI to accept the name "UnicodeBig"
> > and it worked. We could support any other charsets and fix the bug
> > which ICU team hesitated to do this way.  I think it also brings us
> > the extensibility, do you have any concern about implementing a
> > harmony SPI? I'll go on if no one objects.
>
>
> Hey Tony, if we only consider io/lang to support UnicodeBig, will the thing
> be simpler?
>
> On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > >
> > > > I think to support UnicodeBig in nio is not a bug but a feature. And
> > > > the key point is how can I get UnicodeBig supportted in IO/Lang?
> > >
> > >
> > > If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support
> > "UnicodeBig"  as
> > > well?
> > >
> > > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > >
> > > > > > The implemetion is from ICU, so, I think we'd better not to wrap
> > it by
> > > > > > ourselves. I'll post to ICU mailing list and ask if they can help
> > to
> > > > > > supply these legacy charsets.
> > > > >
> > > > >
> > > > > Hey Tony, please keep in mind that following code[1] should print
> > false
> > > > and
> > > > > throw an UnsupportedCharsetException. If ICU provides "UnicodeBig"
> > > > support,
> > > > > does it mean harmony nio also support "UnicodeBig"?
> > > > >
> > > > > [1]
> > > > > System.out.println(Charset.isSupported("UnicodeBig"));
> > > > > Charset.forName("UncodeBig");
> > > > >
> > > > > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Thank you all,
> > > > > > > > It is not just an issue about name.
> > > > > > > > The precondition of mapping is that ICU has really supported
> > this
> > > > > > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to
> > [1].
> > > > > > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work
> > > > around[2]?
> > > > > > >
> > > > > > >
> > > > > > > No, I don't think so. The only difference between "UnicodeBig"
> > and
> > > > > > > "UTF-16BE" is with/without byte-order mark. So it should be easy
> > to
> > > > wrap
> > > > > > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE
> > > > 0xFF at
> > > > > > the
> > > > > > > begining of the bytes and then encode the buffer as "UTF-16BE".
> > Do I
> > > > > > miss
> > > > > > > something?
> > > > > > >
> > > > > > > [1]http://dev.icu-
> > > > > > > >
> > > > > >
> > > >
> > project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > > > > > >
> > > > > > > > [2]
> > > > > > > > UTF-16
> > > > > > > > Sixteen-bit UCS Transformation Format, byte order identified
> > by an
> > > > > > > > optional byte-order mark
> > > > > > > > UnicodeBig
> > > > > > > > Sixteen-bit Unicode Transformation Format, big-endian byte
> > order,
> > > > > > > > with byte-order mark
> > > > > > > > UnicodeLittle
> > > > > > > > Sixteen-bit Unicode Transformation Format, little-endian byte
> > > > order,
> > > > > > > > with byte-order mark
> > > > > > > >
> > > > > > > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > > > > > > Tony Wu wrote:
> > > > > > > > > > Thank you Andrew,
> > > > > > > > > > I think I got the point. The j.l.String of RI uses the
> > > > encoding of
> > > > > > IO
> > > > > > > > > > whereas Charset.forName use another of NIO.
> > > > > > > > > >
> > > > > > > > > > And the new problem is shall we follow the spec[1] to
> > support
> > > > the
> > > > > > two
> > > > > > > > > > suites of charset implemetation? I just have a look and
> > find
> > > > we
> > > > > > does
> > > > > > > > > > not support some Canonical Name for java.io and java.langAPI
> > > > such
> > > > > > as
> > > > > > > > > >
> > > > > >
> > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > > > > > There is such a charset name mapping in InputStreamReader, I
> > > > think
> > > > > > we
> > > > > > > > > have no choice but to support these legacy charset names,
> > you
> > > > may
> > > > > > need
> > > > > > > > > some refactory work to make these classes use the same
> > mapping
> > > > data.
> > > > > > > > > >
> > > > > > > > > > [1]
> > > > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > > >
> > > > > > > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com>
> > wrote:
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > I think Harmony is more reasonable.
> > > > > > > > > >> > >
> > > > > > > > > >> > > As spec says, if  Charset.forName("UnicodeBig")
> > throws
> > > > > > > > > >> > > .UnsupportedCharsetException then no support for the
> > > > named
> > > > > > > > > >> charset is
> > > > > > > > > >> > > available in this instance of the Java virtual
> > machine.
> > > > Then
> > > > > > how
> > > > > > > > > >> can we
> > > > > > > > > >> > > get
> > > > > > > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > > > > > > >> UnsupportedCharsetException
> > > > > > > > > >> > > on
> > > > > > > > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > > > > > > > >> charsetName) also
> > > > > > > > > >> > > says
> > > > > > > > > >> > > if the named charset is not supported,
> > > > > > > > UnsupportedCharsetException
> > > > > > > > > >> > > should be
> > > > > > > > > >> > > thrown out.
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd
> > > > better
> > > > > > > > > >> support such
> > > > > > > > > >> > mapping in String and follow RI.
> > > > > > > > > >> >
> > > > > > > > > >>
> > > > > > > > > >> You can find the encoding set from spec. [1]
> > > > > > > > > >>
> > > > > > > > > >> [1]
> > > > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > > >>
> > > > > > > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Hi all,
> > > > > > > > > >> > > > I found this when I tried to debug the failure
> > tests of
> > > > ant
> > > > > > on
> > > > > > > > > >> > > > harmony. Note the output of testcases below.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > > > > > >> > > > import java.nio.charset.Charset ;
> > > > > > > > > >> > > > import junit.framework.TestCase;
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > public class TestCharset extends TestCase {
> > > > > > > > > >> > > >    public void test1() throws
> > > > UnsupportedEncodingException
> > > > > > {
> > > > > > > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > > > > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > > > > > > >> > > >        assertEquals("abc", s);
> > > > > > > > > >> > > >    }
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >    public void test2() {
> > > > > > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > > > > > >> > > >    }
> > > > > > > > > >> > > > }
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > RI:
> > > > > > > > > >> > > > test1: junit.framework.ComparisonFailure:
> > > > expected:<abc>
> > > > > > but
> > > > > > > > > >> was:<>
> > > > > > > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException
> > :
> > > > > > UnicodeBig
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Harmony:
> > > > > > > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException:
> > > > > > UnicodeBig
> > > > > > > > > >> > > > test2:
> > > > > > > > > >> > > > java.nio.charset.UnsupportedCharsetException: The
> > > > > > unsupported
> > > > > > > > > >> charset
> > > > > > > > > >> > > > name is "UnicodeBig"
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > seems RI can recognize the *UnicodeBig* in
> > Constructor
> > > > of
> > > > > > > > > >> j.l.String,
> > > > > > > > > >> > > > whereas Harmony does not support this alias at all.
> > > > > > > > > >> > > >
> > > > > > > > > >> > > > Do you have any concern about that?
> > > > > > > > > >> > > > --
> > > > > > > > > >> > > > Tony Wu
> > > > > > > > > >> > > > China Software Development Lab, IBM
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >>
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > > > > >> > > > Terms of use :
> > > > > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > >> > > > To unsubscribe, e-mail:
> > > > > > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > >> > > > For additional commands, e-mail:
> > > > > > > > > >> harmony-dev-help@incubator.apache.org
> > > > > > > > > >> > > >
> > > > > > > > > >> > > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > --
> > > > > > > > > >> > > Leo Li
> > > > > > > > > >> > > China Software Development Lab, IBM
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> >
> > > > > > > > > >> >
> > > > > > > > > >> > --
> > > > > > > > > >> > Best regards,
> > > > > > > > > >> > Andrew Zhang
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >> --
> > > > > > > > > >> Best regards,
> > > > > > > > > >> Andrew Zhang
> > > > > > > > > >>
> > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > Paulex Yang
> > > > > > > > > China Software Development Lab
> > > > > > > > > IBM
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > > > > Terms of use :
> > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > > To unsubscribe, e-mail:
> > > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > > For additional commands, e-mail:
> > > > > > harmony-dev-help@incubator.apache.org
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Tony Wu
> > > > > > > > China Software Development Lab, IBM
> > > > > > > >
> > > > > > > >
> > > > ---------------------------------------------------------------------
> > > > > > > > Terms of use :
> > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > To unsubscribe, e-mail:
> > > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > For additional commands, e-mail:
> > > > harmony-dev-help@incubator.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > > Andrew Zhang
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Tony Wu
> > > > > > China Software Development Lab, IBM
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > > To unsubscribe, e-mail:
> > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > For additional commands, e-mail:
> > harmony-dev-help@incubator.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew Zhang
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Tony Wu
> > > > China Software Development Lab, IBM
> > > >
> > > > ---------------------------------------------------------------------
> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew Zhang
> > >
> > >
> >
> >
> > --
> > Tony Wu
> > China Software Development Lab, IBM
> >
>
>
>
> --
> Best regards,
> Andrew Zhang
>
>


-- 
Tony Wu
China Software Development Lab, IBM

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 11/6/06, Tony Wu <wu...@gmail.com> wrote:
>
> A bad news, ICU team refused to support UnicodeBig because it is not
> available in nio.
>
> A good news is that I realize there is a smooth way to support these
> charsets. I tried to implement a SPI to accept the name "UnicodeBig"
> and it worked. We could support any other charsets and fix the bug
> which ICU team hesitated to do this way.  I think it also brings us
> the extensibility, do you have any concern about implementing a
> harmony SPI? I'll go on if no one objects.


Hey Tony, if we only consider io/lang to support UnicodeBig, will the thing
be simpler?

On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > >
> > > I think to support UnicodeBig in nio is not a bug but a feature. And
> > > the key point is how can I get UnicodeBig supportted in IO/Lang?
> >
> >
> > If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support
> "UnicodeBig"  as
> > well?
> >
> > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > >
> > > > > The implemetion is from ICU, so, I think we'd better not to wrap
> it by
> > > > > ourselves. I'll post to ICU mailing list and ask if they can help
> to
> > > > > supply these legacy charsets.
> > > >
> > > >
> > > > Hey Tony, please keep in mind that following code[1] should print
> false
> > > and
> > > > throw an UnsupportedCharsetException. If ICU provides "UnicodeBig"
> > > support,
> > > > does it mean harmony nio also support "UnicodeBig"?
> > > >
> > > > [1]
> > > > System.out.println(Charset.isSupported("UnicodeBig"));
> > > > Charset.forName("UncodeBig");
> > > >
> > > > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > >
> > > > > > > Thank you all,
> > > > > > > It is not just an issue about name.
> > > > > > > The precondition of mapping is that ICU has really supported
> this
> > > > > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to
> [1].
> > > > > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work
> > > around[2]?
> > > > > >
> > > > > >
> > > > > > No, I don't think so. The only difference between "UnicodeBig"
> and
> > > > > > "UTF-16BE" is with/without byte-order mark. So it should be easy
> to
> > > wrap
> > > > > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE
> > > 0xFF at
> > > > > the
> > > > > > begining of the bytes and then encode the buffer as "UTF-16BE".
> Do I
> > > > > miss
> > > > > > something?
> > > > > >
> > > > > > [1]http://dev.icu-
> > > > > > >
> > > > >
> > >
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > > > > >
> > > > > > > [2]
> > > > > > > UTF-16
> > > > > > > Sixteen-bit UCS Transformation Format, byte order identified
> by an
> > > > > > > optional byte-order mark
> > > > > > > UnicodeBig
> > > > > > > Sixteen-bit Unicode Transformation Format, big-endian byte
> order,
> > > > > > > with byte-order mark
> > > > > > > UnicodeLittle
> > > > > > > Sixteen-bit Unicode Transformation Format, little-endian byte
> > > order,
> > > > > > > with byte-order mark
> > > > > > >
> > > > > > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > > > > > Tony Wu wrote:
> > > > > > > > > Thank you Andrew,
> > > > > > > > > I think I got the point. The j.l.String of RI uses the
> > > encoding of
> > > > > IO
> > > > > > > > > whereas Charset.forName use another of NIO.
> > > > > > > > >
> > > > > > > > > And the new problem is shall we follow the spec[1] to
> support
> > > the
> > > > > two
> > > > > > > > > suites of charset implemetation? I just have a look and
> find
> > > we
> > > > > does
> > > > > > > > > not support some Canonical Name for java.io and java.langAPI
> > > such
> > > > > as
> > > > > > > > >
> > > > >
> UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > > > > There is such a charset name mapping in InputStreamReader, I
> > > think
> > > > > we
> > > > > > > > have no choice but to support these legacy charset names,
> you
> > > may
> > > > > need
> > > > > > > > some refactory work to make these classes use the same
> mapping
> > > data.
> > > > > > > > >
> > > > > > > > > [1]
> > > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > >
> > > > > > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com>
> wrote:
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > I think Harmony is more reasonable.
> > > > > > > > >> > >
> > > > > > > > >> > > As spec says, if  Charset.forName("UnicodeBig")
> throws
> > > > > > > > >> > > .UnsupportedCharsetException then no support for the
> > > named
> > > > > > > > >> charset is
> > > > > > > > >> > > available in this instance of the Java virtual
> machine.
> > > Then
> > > > > how
> > > > > > > > >> can we
> > > > > > > > >> > > get
> > > > > > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > > > > > >> UnsupportedCharsetException
> > > > > > > > >> > > on
> > > > > > > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > > > > > > >> charsetName) also
> > > > > > > > >> > > says
> > > > > > > > >> > > if the named charset is not supported,
> > > > > > > UnsupportedCharsetException
> > > > > > > > >> > > should be
> > > > > > > > >> > > thrown out.
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd
> > > better
> > > > > > > > >> support such
> > > > > > > > >> > mapping in String and follow RI.
> > > > > > > > >> >
> > > > > > > > >>
> > > > > > > > >> You can find the encoding set from spec. [1]
> > > > > > > > >>
> > > > > > > > >> [1]
> > > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > > >>
> > > > > > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > > > >> > > >
> > > > > > > > >> > > > Hi all,
> > > > > > > > >> > > > I found this when I tried to debug the failure
> tests of
> > > ant
> > > > > on
> > > > > > > > >> > > > harmony. Note the output of testcases below.
> > > > > > > > >> > > >
> > > > > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > > > > >> > > > import java.nio.charset.Charset ;
> > > > > > > > >> > > > import junit.framework.TestCase;
> > > > > > > > >> > > >
> > > > > > > > >> > > > public class TestCharset extends TestCase {
> > > > > > > > >> > > >    public void test1() throws
> > > UnsupportedEncodingException
> > > > > {
> > > > > > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > > > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > > > > > >> > > >        assertEquals("abc", s);
> > > > > > > > >> > > >    }
> > > > > > > > >> > > >
> > > > > > > > >> > > >    public void test2() {
> > > > > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > > > > >> > > >    }
> > > > > > > > >> > > > }
> > > > > > > > >> > > >
> > > > > > > > >> > > > RI:
> > > > > > > > >> > > > test1: junit.framework.ComparisonFailure:
> > > expected:<abc>
> > > > > but
> > > > > > > > >> was:<>
> > > > > > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException
> :
> > > > > UnicodeBig
> > > > > > > > >> > > >
> > > > > > > > >> > > > Harmony:
> > > > > > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException:
> > > > > UnicodeBig
> > > > > > > > >> > > > test2:
> > > > > > > > >> > > > java.nio.charset.UnsupportedCharsetException: The
> > > > > unsupported
> > > > > > > > >> charset
> > > > > > > > >> > > > name is "UnicodeBig"
> > > > > > > > >> > > >
> > > > > > > > >> > > > seems RI can recognize the *UnicodeBig* in
> Constructor
> > > of
> > > > > > > > >> j.l.String,
> > > > > > > > >> > > > whereas Harmony does not support this alias at all.
> > > > > > > > >> > > >
> > > > > > > > >> > > > Do you have any concern about that?
> > > > > > > > >> > > > --
> > > > > > > > >> > > > Tony Wu
> > > > > > > > >> > > > China Software Development Lab, IBM
> > > > > > > > >> > > >
> > > > > > > > >> > > >
> > > > > > > > >>
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > >> > > > Terms of use :
> > > > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > > >> > > > To unsubscribe, e-mail:
> > > > > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > >> > > > For additional commands, e-mail:
> > > > > > > > >> harmony-dev-help@incubator.apache.org
> > > > > > > > >> > > >
> > > > > > > > >> > > >
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > --
> > > > > > > > >> > > Leo Li
> > > > > > > > >> > > China Software Development Lab, IBM
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> >
> > > > > > > > >> >
> > > > > > > > >> > --
> > > > > > > > >> > Best regards,
> > > > > > > > >> > Andrew Zhang
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> --
> > > > > > > > >> Best regards,
> > > > > > > > >> Andrew Zhang
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Paulex Yang
> > > > > > > > China Software Development Lab
> > > > > > > > IBM
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > > > > Terms of use :
> http://incubator.apache.org/harmony/mailing.html
> > > > > > > > To unsubscribe, e-mail:
> > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > > For additional commands, e-mail:
> > > > > harmony-dev-help@incubator.apache.org
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Tony Wu
> > > > > > > China Software Development Lab, IBM
> > > > > > >
> > > > > > >
> > > ---------------------------------------------------------------------
> > > > > > > Terms of use :
> http://incubator.apache.org/harmony/mailing.html
> > > > > > > To unsubscribe, e-mail:
> > > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > For additional commands, e-mail:
> > > harmony-dev-help@incubator.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > > Andrew Zhang
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tony Wu
> > > > > China Software Development Lab, IBM
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew Zhang
> > > >
> > > >
> > >
> > >
> > > --
> > > Tony Wu
> > > China Software Development Lab, IBM
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>



-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Tony Wu <wu...@gmail.com>.
A bad news, ICU team refused to support UnicodeBig because it is not
available in nio.

A good news is that I realize there is a smooth way to support these
charsets. I tried to implement a SPI to accept the name "UnicodeBig"
and it worked. We could support any other charsets and fix the bug
which ICU team hesitated to do this way.  I think it also brings us
the extensibility, do you have any concern about implementing a
harmony SPI? I'll go on if no one objects.

On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> >
> > I think to support UnicodeBig in nio is not a bug but a feature. And
> > the key point is how can I get UnicodeBig supportted in IO/Lang?
>
>
> If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support "UnicodeBig"  as
> well?
>
> On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > >
> > > > The implemetion is from ICU, so, I think we'd better not to wrap it by
> > > > ourselves. I'll post to ICU mailing list and ask if they can help to
> > > > supply these legacy charsets.
> > >
> > >
> > > Hey Tony, please keep in mind that following code[1] should print false
> > and
> > > throw an UnsupportedCharsetException. If ICU provides "UnicodeBig"
> > support,
> > > does it mean harmony nio also support "UnicodeBig"?
> > >
> > > [1]
> > > System.out.println(Charset.isSupported("UnicodeBig"));
> > > Charset.forName("UncodeBig");
> > >
> > > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > >
> > > > > > Thank you all,
> > > > > > It is not just an issue about name.
> > > > > > The precondition of mapping is that ICU has really supported this
> > > > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> > > > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work
> > around[2]?
> > > > >
> > > > >
> > > > > No, I don't think so. The only difference between "UnicodeBig" and
> > > > > "UTF-16BE" is with/without byte-order mark. So it should be easy to
> > wrap
> > > > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE
> > 0xFF at
> > > > the
> > > > > begining of the bytes and then encode the buffer as "UTF-16BE". Do I
> > > > miss
> > > > > something?
> > > > >
> > > > > [1]http://dev.icu-
> > > > > >
> > > >
> > project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > > > >
> > > > > > [2]
> > > > > > UTF-16
> > > > > > Sixteen-bit UCS Transformation Format, byte order identified by an
> > > > > > optional byte-order mark
> > > > > > UnicodeBig
> > > > > > Sixteen-bit Unicode Transformation Format, big-endian byte order,
> > > > > > with byte-order mark
> > > > > > UnicodeLittle
> > > > > > Sixteen-bit Unicode Transformation Format, little-endian byte
> > order,
> > > > > > with byte-order mark
> > > > > >
> > > > > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > > > > Tony Wu wrote:
> > > > > > > > Thank you Andrew,
> > > > > > > > I think I got the point. The j.l.String of RI uses the
> > encoding of
> > > > IO
> > > > > > > > whereas Charset.forName use another of NIO.
> > > > > > > >
> > > > > > > > And the new problem is shall we follow the spec[1] to support
> > the
> > > > two
> > > > > > > > suites of charset implemetation? I just have a look and find
> > we
> > > > does
> > > > > > > > not support some Canonical Name for java.io and java.lang API
> > such
> > > > as
> > > > > > > >
> > > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > > > There is such a charset name mapping in InputStreamReader, I
> > think
> > > > we
> > > > > > > have no choice but to support these legacy charset names, you
> > may
> > > > need
> > > > > > > some refactory work to make these classes use the same mapping
> > data.
> > > > > > > >
> > > > > > > > [1]
> > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > >
> > > > > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > > > > >> > >
> > > > > > > >> > > I think Harmony is more reasonable.
> > > > > > > >> > >
> > > > > > > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > > > > > >> > > .UnsupportedCharsetException then no support for the
> > named
> > > > > > > >> charset is
> > > > > > > >> > > available in this instance of the Java virtual machine.
> > Then
> > > > how
> > > > > > > >> can we
> > > > > > > >> > > get
> > > > > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > > > > >> UnsupportedCharsetException
> > > > > > > >> > > on
> > > > > > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > > > > > >> charsetName) also
> > > > > > > >> > > says
> > > > > > > >> > > if the named charset is not supported,
> > > > > > UnsupportedCharsetException
> > > > > > > >> > > should be
> > > > > > > >> > > thrown out.
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd
> > better
> > > > > > > >> support such
> > > > > > > >> > mapping in String and follow RI.
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >> You can find the encoding set from spec. [1]
> > > > > > > >>
> > > > > > > >> [1]
> > > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > > >>
> > > > > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > > >> > > >
> > > > > > > >> > > > Hi all,
> > > > > > > >> > > > I found this when I tried to debug the failure tests of
> > ant
> > > > on
> > > > > > > >> > > > harmony. Note the output of testcases below.
> > > > > > > >> > > >
> > > > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > > > >> > > > import java.nio.charset.Charset ;
> > > > > > > >> > > > import junit.framework.TestCase;
> > > > > > > >> > > >
> > > > > > > >> > > > public class TestCharset extends TestCase {
> > > > > > > >> > > >    public void test1() throws
> > UnsupportedEncodingException
> > > > {
> > > > > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > > > > >> > > >        assertEquals("abc", s);
> > > > > > > >> > > >    }
> > > > > > > >> > > >
> > > > > > > >> > > >    public void test2() {
> > > > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > > > >> > > >    }
> > > > > > > >> > > > }
> > > > > > > >> > > >
> > > > > > > >> > > > RI:
> > > > > > > >> > > > test1: junit.framework.ComparisonFailure:
> > expected:<abc>
> > > > but
> > > > > > > >> was:<>
> > > > > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException:
> > > > UnicodeBig
> > > > > > > >> > > >
> > > > > > > >> > > > Harmony:
> > > > > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException:
> > > > UnicodeBig
> > > > > > > >> > > > test2:
> > > > > > > >> > > > java.nio.charset.UnsupportedCharsetException: The
> > > > unsupported
> > > > > > > >> charset
> > > > > > > >> > > > name is "UnicodeBig"
> > > > > > > >> > > >
> > > > > > > >> > > > seems RI can recognize the *UnicodeBig* in Constructor
> > of
> > > > > > > >> j.l.String,
> > > > > > > >> > > > whereas Harmony does not support this alias at all.
> > > > > > > >> > > >
> > > > > > > >> > > > Do you have any concern about that?
> > > > > > > >> > > > --
> > > > > > > >> > > > Tony Wu
> > > > > > > >> > > > China Software Development Lab, IBM
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >>
> > > > ---------------------------------------------------------------------
> > > > > > > >> > > > Terms of use :
> > > > http://incubator.apache.org/harmony/mailing.html
> > > > > > > >> > > > To unsubscribe, e-mail:
> > > > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > >> > > > For additional commands, e-mail:
> > > > > > > >> harmony-dev-help@incubator.apache.org
> > > > > > > >> > > >
> > > > > > > >> > > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > --
> > > > > > > >> > > Leo Li
> > > > > > > >> > > China Software Development Lab, IBM
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> >
> > > > > > > >> >
> > > > > > > >> > --
> > > > > > > >> > Best regards,
> > > > > > > >> > Andrew Zhang
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> --
> > > > > > > >> Best regards,
> > > > > > > >> Andrew Zhang
> > > > > > > >>
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > --
> > > > > > > Paulex Yang
> > > > > > > China Software Development Lab
> > > > > > > IBM
> > > > > > >
> > > > > > >
> > > > > > >
> > > > ---------------------------------------------------------------------
> > > > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > > > To unsubscribe, e-mail:
> > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > > For additional commands, e-mail:
> > > > harmony-dev-help@incubator.apache.org
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Tony Wu
> > > > > > China Software Development Lab, IBM
> > > > > >
> > > > > >
> > ---------------------------------------------------------------------
> > > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > > To unsubscribe, e-mail:
> > harmony-dev-unsubscribe@incubator.apache.org
> > > > > > For additional commands, e-mail:
> > harmony-dev-help@incubator.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Best regards,
> > > > > Andrew Zhang
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Tony Wu
> > > > China Software Development Lab, IBM
> > > >
> > > > ---------------------------------------------------------------------
> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew Zhang
> > >
> > >
> >
> >
> > --
> > Tony Wu
> > China Software Development Lab, IBM
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Best regards,
> Andrew Zhang
>
>


-- 
Tony Wu
China Software Development Lab, IBM

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
>
> I think to support UnicodeBig in nio is not a bug but a feature. And
> the key point is how can I get UnicodeBig supportted in IO/Lang?


If ICU/NIO supports "UnicodeBig", wouldn't IO/LANG support "UnicodeBig"  as
well?

On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > >
> > > The implemetion is from ICU, so, I think we'd better not to wrap it by
> > > ourselves. I'll post to ICU mailing list and ask if they can help to
> > > supply these legacy charsets.
> >
> >
> > Hey Tony, please keep in mind that following code[1] should print false
> and
> > throw an UnsupportedCharsetException. If ICU provides "UnicodeBig"
> support,
> > does it mean harmony nio also support "UnicodeBig"?
> >
> > [1]
> > System.out.println(Charset.isSupported("UnicodeBig"));
> > Charset.forName("UncodeBig");
> >
> > On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > > >
> > > > > Thank you all,
> > > > > It is not just an issue about name.
> > > > > The precondition of mapping is that ICU has really supported this
> > > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> > > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work
> around[2]?
> > > >
> > > >
> > > > No, I don't think so. The only difference between "UnicodeBig" and
> > > > "UTF-16BE" is with/without byte-order mark. So it should be easy to
> wrap
> > > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE
> 0xFF at
> > > the
> > > > begining of the bytes and then encode the buffer as "UTF-16BE". Do I
> > > miss
> > > > something?
> > > >
> > > > [1]http://dev.icu-
> > > > >
> > >
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > > >
> > > > > [2]
> > > > > UTF-16
> > > > > Sixteen-bit UCS Transformation Format, byte order identified by an
> > > > > optional byte-order mark
> > > > > UnicodeBig
> > > > > Sixteen-bit Unicode Transformation Format, big-endian byte order,
> > > > > with byte-order mark
> > > > > UnicodeLittle
> > > > > Sixteen-bit Unicode Transformation Format, little-endian byte
> order,
> > > > > with byte-order mark
> > > > >
> > > > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > > > Tony Wu wrote:
> > > > > > > Thank you Andrew,
> > > > > > > I think I got the point. The j.l.String of RI uses the
> encoding of
> > > IO
> > > > > > > whereas Charset.forName use another of NIO.
> > > > > > >
> > > > > > > And the new problem is shall we follow the spec[1] to support
> the
> > > two
> > > > > > > suites of charset implemetation? I just have a look and find
> we
> > > does
> > > > > > > not support some Canonical Name for java.io and java.lang API
> such
> > > as
> > > > > > >
> > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > > There is such a charset name mapping in InputStreamReader, I
> think
> > > we
> > > > > > have no choice but to support these legacy charset names, you
> may
> > > need
> > > > > > some refactory work to make these classes use the same mapping
> data.
> > > > > > >
> > > > > > > [1]
> > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > >
> > > > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > > > >> > >
> > > > > > >> > > I think Harmony is more reasonable.
> > > > > > >> > >
> > > > > > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > > > > >> > > .UnsupportedCharsetException then no support for the
> named
> > > > > > >> charset is
> > > > > > >> > > available in this instance of the Java virtual machine.
> Then
> > > how
> > > > > > >> can we
> > > > > > >> > > get
> > > > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > > > >> UnsupportedCharsetException
> > > > > > >> > > on
> > > > > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > > > > >> charsetName) also
> > > > > > >> > > says
> > > > > > >> > > if the named charset is not supported,
> > > > > UnsupportedCharsetException
> > > > > > >> > > should be
> > > > > > >> > > thrown out.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd
> better
> > > > > > >> support such
> > > > > > >> > mapping in String and follow RI.
> > > > > > >> >
> > > > > > >>
> > > > > > >> You can find the encoding set from spec. [1]
> > > > > > >>
> > > > > > >> [1]
> > > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > > >>
> > > > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > > >> > > >
> > > > > > >> > > > Hi all,
> > > > > > >> > > > I found this when I tried to debug the failure tests of
> ant
> > > on
> > > > > > >> > > > harmony. Note the output of testcases below.
> > > > > > >> > > >
> > > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > > >> > > > import java.nio.charset.Charset ;
> > > > > > >> > > > import junit.framework.TestCase;
> > > > > > >> > > >
> > > > > > >> > > > public class TestCharset extends TestCase {
> > > > > > >> > > >    public void test1() throws
> UnsupportedEncodingException
> > > {
> > > > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > > > >> > > >        assertEquals("abc", s);
> > > > > > >> > > >    }
> > > > > > >> > > >
> > > > > > >> > > >    public void test2() {
> > > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > > >> > > >    }
> > > > > > >> > > > }
> > > > > > >> > > >
> > > > > > >> > > > RI:
> > > > > > >> > > > test1: junit.framework.ComparisonFailure:
> expected:<abc>
> > > but
> > > > > > >> was:<>
> > > > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException:
> > > UnicodeBig
> > > > > > >> > > >
> > > > > > >> > > > Harmony:
> > > > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException:
> > > UnicodeBig
> > > > > > >> > > > test2:
> > > > > > >> > > > java.nio.charset.UnsupportedCharsetException: The
> > > unsupported
> > > > > > >> charset
> > > > > > >> > > > name is "UnicodeBig"
> > > > > > >> > > >
> > > > > > >> > > > seems RI can recognize the *UnicodeBig* in Constructor
> of
> > > > > > >> j.l.String,
> > > > > > >> > > > whereas Harmony does not support this alias at all.
> > > > > > >> > > >
> > > > > > >> > > > Do you have any concern about that?
> > > > > > >> > > > --
> > > > > > >> > > > Tony Wu
> > > > > > >> > > > China Software Development Lab, IBM
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >>
> > > ---------------------------------------------------------------------
> > > > > > >> > > > Terms of use :
> > > http://incubator.apache.org/harmony/mailing.html
> > > > > > >> > > > To unsubscribe, e-mail:
> > > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > >> > > > For additional commands, e-mail:
> > > > > > >> harmony-dev-help@incubator.apache.org
> > > > > > >> > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Leo Li
> > > > > > >> > > China Software Development Lab, IBM
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --
> > > > > > >> > Best regards,
> > > > > > >> > Andrew Zhang
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Best regards,
> > > > > > >> Andrew Zhang
> > > > > > >>
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Paulex Yang
> > > > > > China Software Development Lab
> > > > > > IBM
> > > > > >
> > > > > >
> > > > > >
> > > ---------------------------------------------------------------------
> > > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > > To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> > > > > > For additional commands, e-mail:
> > > harmony-dev-help@incubator.apache.org
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Tony Wu
> > > > > China Software Development Lab, IBM
> > > > >
> > > > >
> ---------------------------------------------------------------------
> > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > To unsubscribe, e-mail:
> harmony-dev-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best regards,
> > > > Andrew Zhang
> > > >
> > > >
> > >
> > >
> > > --
> > > Tony Wu
> > > China Software Development Lab, IBM
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Tony Wu <wu...@gmail.com>.
I think to support UnicodeBig in nio is not a bug but a feature. And
the key point is how can I get UnicodeBig supportted in IO/Lang?

On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> >
> > The implemetion is from ICU, so, I think we'd better not to wrap it by
> > ourselves. I'll post to ICU mailing list and ask if they can help to
> > supply these legacy charsets.
>
>
> Hey Tony, please keep in mind that following code[1] should print false and
> throw an UnsupportedCharsetException. If ICU provides "UnicodeBig" support,
> does it mean harmony nio also support "UnicodeBig"?
>
> [1]
> System.out.println(Charset.isSupported("UnicodeBig"));
> Charset.forName("UncodeBig");
>
> On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > > >
> > > > Thank you all,
> > > > It is not just an issue about name.
> > > > The precondition of mapping is that ICU has really supported this
> > > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> > > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?
> > >
> > >
> > > No, I don't think so. The only difference between "UnicodeBig" and
> > > "UTF-16BE" is with/without byte-order mark. So it should be easy to wrap
> > > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE 0xFF at
> > the
> > > begining of the bytes and then encode the buffer as "UTF-16BE". Do I
> > miss
> > > something?
> > >
> > > [1]http://dev.icu-
> > > >
> > project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > > >
> > > > [2]
> > > > UTF-16
> > > > Sixteen-bit UCS Transformation Format, byte order identified by an
> > > > optional byte-order mark
> > > > UnicodeBig
> > > > Sixteen-bit Unicode Transformation Format, big-endian byte order,
> > > > with byte-order mark
> > > > UnicodeLittle
> > > > Sixteen-bit Unicode Transformation Format, little-endian byte order,
> > > > with byte-order mark
> > > >
> > > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > > Tony Wu wrote:
> > > > > > Thank you Andrew,
> > > > > > I think I got the point. The j.l.String of RI uses the encoding of
> > IO
> > > > > > whereas Charset.forName use another of NIO.
> > > > > >
> > > > > > And the new problem is shall we follow the spec[1] to support the
> > two
> > > > > > suites of charset implemetation? I just have a look and find we
> > does
> > > > > > not support some Canonical Name for java.io and java.lang API such
> > as
> > > > > >
> > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > > There is such a charset name mapping in InputStreamReader, I think
> > we
> > > > > have no choice but to support these legacy charset names, you may
> > need
> > > > > some refactory work to make these classes use the same mapping data.
> > > > > >
> > > > > > [1]
> > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > >
> > > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > > >> > >
> > > > > >> > > I think Harmony is more reasonable.
> > > > > >> > >
> > > > > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > > > >> > > .UnsupportedCharsetException then no support for the named
> > > > > >> charset is
> > > > > >> > > available in this instance of the Java virtual machine. Then
> > how
> > > > > >> can we
> > > > > >> > > get
> > > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > > >> UnsupportedCharsetException
> > > > > >> > > on
> > > > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > > > >> charsetName) also
> > > > > >> > > says
> > > > > >> > > if the named charset is not supported,
> > > > UnsupportedCharsetException
> > > > > >> > > should be
> > > > > >> > > thrown out.
> > > > > >> >
> > > > > >> >
> > > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> > > > > >> support such
> > > > > >> > mapping in String and follow RI.
> > > > > >> >
> > > > > >>
> > > > > >> You can find the encoding set from spec. [1]
> > > > > >>
> > > > > >> [1]
> > http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > > >>
> > > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > > >> > > >
> > > > > >> > > > Hi all,
> > > > > >> > > > I found this when I tried to debug the failure tests of ant
> > on
> > > > > >> > > > harmony. Note the output of testcases below.
> > > > > >> > > >
> > > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > > >> > > > import java.nio.charset.Charset ;
> > > > > >> > > > import junit.framework.TestCase;
> > > > > >> > > >
> > > > > >> > > > public class TestCharset extends TestCase {
> > > > > >> > > >    public void test1() throws UnsupportedEncodingException
> > {
> > > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > > >> > > >        assertEquals("abc", s);
> > > > > >> > > >    }
> > > > > >> > > >
> > > > > >> > > >    public void test2() {
> > > > > >> > > >        Charset.forName("UnicodeBig");
> > > > > >> > > >    }
> > > > > >> > > > }
> > > > > >> > > >
> > > > > >> > > > RI:
> > > > > >> > > > test1: junit.framework.ComparisonFailure: expected:<abc>
> > but
> > > > > >> was:<>
> > > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException:
> > UnicodeBig
> > > > > >> > > >
> > > > > >> > > > Harmony:
> > > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException:
> > UnicodeBig
> > > > > >> > > > test2:
> > > > > >> > > > java.nio.charset.UnsupportedCharsetException: The
> > unsupported
> > > > > >> charset
> > > > > >> > > > name is "UnicodeBig"
> > > > > >> > > >
> > > > > >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> > > > > >> j.l.String,
> > > > > >> > > > whereas Harmony does not support this alias at all.
> > > > > >> > > >
> > > > > >> > > > Do you have any concern about that?
> > > > > >> > > > --
> > > > > >> > > > Tony Wu
> > > > > >> > > > China Software Development Lab, IBM
> > > > > >> > > >
> > > > > >> > > >
> > > > > >>
> > ---------------------------------------------------------------------
> > > > > >> > > > Terms of use :
> > http://incubator.apache.org/harmony/mailing.html
> > > > > >> > > > To unsubscribe, e-mail:
> > > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > > >> > > > For additional commands, e-mail:
> > > > > >> harmony-dev-help@incubator.apache.org
> > > > > >> > > >
> > > > > >> > > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > --
> > > > > >> > > Leo Li
> > > > > >> > > China Software Development Lab, IBM
> > > > > >> > >
> > > > > >> > >
> > > > > >> >
> > > > > >> >
> > > > > >> > --
> > > > > >> > Best regards,
> > > > > >> > Andrew Zhang
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >> --
> > > > > >> Best regards,
> > > > > >> Andrew Zhang
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Paulex Yang
> > > > > China Software Development Lab
> > > > > IBM
> > > > >
> > > > >
> > > > >
> > ---------------------------------------------------------------------
> > > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > > For additional commands, e-mail:
> > harmony-dev-help@incubator.apache.org
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Tony Wu
> > > > China Software Development Lab, IBM
> > > >
> > > > ---------------------------------------------------------------------
> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Andrew Zhang
> > >
> > >
> >
> >
> > --
> > Tony Wu
> > China Software Development Lab, IBM
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Best regards,
> Andrew Zhang
>
>


-- 
Tony Wu
China Software Development Lab, IBM

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
>
> The implemetion is from ICU, so, I think we'd better not to wrap it by
> ourselves. I'll post to ICU mailing list and ask if they can help to
> supply these legacy charsets.


Hey Tony, please keep in mind that following code[1] should print false and
throw an UnsupportedCharsetException. If ICU provides "UnicodeBig" support,
does it mean harmony nio also support "UnicodeBig"?

[1]
System.out.println(Charset.isSupported("UnicodeBig"));
Charset.forName("UncodeBig");

On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> > On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> > >
> > > Thank you all,
> > > It is not just an issue about name.
> > > The precondition of mapping is that ICU has really supported this
> > > charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> > > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?
> >
> >
> > No, I don't think so. The only difference between "UnicodeBig" and
> > "UTF-16BE" is with/without byte-order mark. So it should be easy to wrap
> > "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE 0xFF at
> the
> > begining of the bytes and then encode the buffer as "UTF-16BE". Do I
> miss
> > something?
> >
> > [1]http://dev.icu-
> > >
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> > >
> > > [2]
> > > UTF-16
> > > Sixteen-bit UCS Transformation Format, byte order identified by an
> > > optional byte-order mark
> > > UnicodeBig
> > > Sixteen-bit Unicode Transformation Format, big-endian byte order,
> > > with byte-order mark
> > > UnicodeLittle
> > > Sixteen-bit Unicode Transformation Format, little-endian byte order,
> > > with byte-order mark
> > >
> > > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > > Tony Wu wrote:
> > > > > Thank you Andrew,
> > > > > I think I got the point. The j.l.String of RI uses the encoding of
> IO
> > > > > whereas Charset.forName use another of NIO.
> > > > >
> > > > > And the new problem is shall we follow the spec[1] to support the
> two
> > > > > suites of charset implemetation? I just have a look and find we
> does
> > > > > not support some Canonical Name for java.io and java.lang API such
> as
> > > > >
> UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > > There is such a charset name mapping in InputStreamReader, I think
> we
> > > > have no choice but to support these legacy charset names, you may
> need
> > > > some refactory work to make these classes use the same mapping data.
> > > > >
> > > > > [1]
> http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > >
> > > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > > >> > >
> > > > >> > > I think Harmony is more reasonable.
> > > > >> > >
> > > > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > > >> > > .UnsupportedCharsetException then no support for the named
> > > > >> charset is
> > > > >> > > available in this instance of the Java virtual machine. Then
> how
> > > > >> can we
> > > > >> > > get
> > > > >> > > new String(b, "UnicodeBig") without throwing
> > > > >> UnsupportedCharsetException
> > > > >> > > on
> > > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > > >> charsetName) also
> > > > >> > > says
> > > > >> > > if the named charset is not supported,
> > > UnsupportedCharsetException
> > > > >> > > should be
> > > > >> > > thrown out.
> > > > >> >
> > > > >> >
> > > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> > > > >> support such
> > > > >> > mapping in String and follow RI.
> > > > >> >
> > > > >>
> > > > >> You can find the encoding set from spec. [1]
> > > > >>
> > > > >> [1]
> http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > > >>
> > > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > > >> > > >
> > > > >> > > > Hi all,
> > > > >> > > > I found this when I tried to debug the failure tests of ant
> on
> > > > >> > > > harmony. Note the output of testcases below.
> > > > >> > > >
> > > > >> > > > import java.io.UnsupportedEncodingException;
> > > > >> > > > import java.nio.charset.Charset ;
> > > > >> > > > import junit.framework.TestCase;
> > > > >> > > >
> > > > >> > > > public class TestCharset extends TestCase {
> > > > >> > > >    public void test1() throws UnsupportedEncodingException
> {
> > > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > > >> > > >        String s = new String(b, "UnicodeBig");
> > > > >> > > >        assertEquals("abc", s);
> > > > >> > > >    }
> > > > >> > > >
> > > > >> > > >    public void test2() {
> > > > >> > > >        Charset.forName("UnicodeBig");
> > > > >> > > >    }
> > > > >> > > > }
> > > > >> > > >
> > > > >> > > > RI:
> > > > >> > > > test1: junit.framework.ComparisonFailure: expected:<abc>
> but
> > > > >> was:<>
> > > > >> > > > test2: java.nio.charset.UnsupportedCharsetException:
> UnicodeBig
> > > > >> > > >
> > > > >> > > > Harmony:
> > > > >> > > > test1:java.nio.charset.UnsupportedCharsetException:
> UnicodeBig
> > > > >> > > > test2:
> > > > >> > > > java.nio.charset.UnsupportedCharsetException: The
> unsupported
> > > > >> charset
> > > > >> > > > name is "UnicodeBig"
> > > > >> > > >
> > > > >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> > > > >> j.l.String,
> > > > >> > > > whereas Harmony does not support this alias at all.
> > > > >> > > >
> > > > >> > > > Do you have any concern about that?
> > > > >> > > > --
> > > > >> > > > Tony Wu
> > > > >> > > > China Software Development Lab, IBM
> > > > >> > > >
> > > > >> > > >
> > > > >>
> ---------------------------------------------------------------------
> > > > >> > > > Terms of use :
> http://incubator.apache.org/harmony/mailing.html
> > > > >> > > > To unsubscribe, e-mail:
> > > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > > >> > > > For additional commands, e-mail:
> > > > >> harmony-dev-help@incubator.apache.org
> > > > >> > > >
> > > > >> > > >
> > > > >> > >
> > > > >> > >
> > > > >> > > --
> > > > >> > > Leo Li
> > > > >> > > China Software Development Lab, IBM
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > >> >
> > > > >> > --
> > > > >> > Best regards,
> > > > >> > Andrew Zhang
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Best regards,
> > > > >> Andrew Zhang
> > > > >>
> > > > >>
> > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Paulex Yang
> > > > China Software Development Lab
> > > > IBM
> > > >
> > > >
> > > >
> ---------------------------------------------------------------------
> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail:
> harmony-dev-help@incubator.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Tony Wu
> > > China Software Development Lab, IBM
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Tony Wu <wu...@gmail.com>.
The implemetion is from ICU, so, I think we'd better not to wrap it by
ourselves. I'll post to ICU mailing list and ask if they can help to
supply these legacy charsets.

On 10/19/06, Andrew Zhang <zh...@gmail.com> wrote:
> On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
> >
> > Thank you all,
> > It is not just an issue about name.
> > The precondition of mapping is that ICU has really supported this
> > charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> > Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?
>
>
> No, I don't think so. The only difference between "UnicodeBig" and
> "UTF-16BE" is with/without byte-order mark. So it should be easy to wrap
> "UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE 0xFF at the
> begining of the bytes and then encode the buffer as "UTF-16BE". Do I miss
> something?
>
> [1]http://dev.icu-
> > project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
> >
> > [2]
> > UTF-16
> > Sixteen-bit UCS Transformation Format, byte order identified by an
> > optional byte-order mark
> > UnicodeBig
> > Sixteen-bit Unicode Transformation Format, big-endian byte order,
> > with byte-order mark
> > UnicodeLittle
> > Sixteen-bit Unicode Transformation Format, little-endian byte order,
> > with byte-order mark
> >
> > On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > > Tony Wu wrote:
> > > > Thank you Andrew,
> > > > I think I got the point. The j.l.String of RI uses the encoding of IO
> > > > whereas Charset.forName use another of NIO.
> > > >
> > > > And the new problem is shall we follow the spec[1] to support the two
> > > > suites of charset implemetation? I just have a look and find we does
> > > > not support some Canonical Name for java.io and java.lang API such as
> > > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > > There is such a charset name mapping in InputStreamReader, I think we
> > > have no choice but to support these legacy charset names, you may need
> > > some refactory work to make these classes use the same mapping data.
> > > >
> > > > [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > >
> > > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > > >> > >
> > > >> > > I think Harmony is more reasonable.
> > > >> > >
> > > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > >> > > .UnsupportedCharsetException then no support for the named
> > > >> charset is
> > > >> > > available in this instance of the Java virtual machine. Then how
> > > >> can we
> > > >> > > get
> > > >> > > new String(b, "UnicodeBig") without throwing
> > > >> UnsupportedCharsetException
> > > >> > > on
> > > >> > > the same jvm? The spec for String(byte[] bytes,String
> > > >> charsetName) also
> > > >> > > says
> > > >> > > if the named charset is not supported,
> > UnsupportedCharsetException
> > > >> > > should be
> > > >> > > thrown out.
> > > >> >
> > > >> >
> > > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> > > >> support such
> > > >> > mapping in String and follow RI.
> > > >> >
> > > >>
> > > >> You can find the encoding set from spec. [1]
> > > >>
> > > >> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > > >>
> > > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > >> > > >
> > > >> > > > Hi all,
> > > >> > > > I found this when I tried to debug the failure tests of ant on
> > > >> > > > harmony. Note the output of testcases below.
> > > >> > > >
> > > >> > > > import java.io.UnsupportedEncodingException;
> > > >> > > > import java.nio.charset.Charset ;
> > > >> > > > import junit.framework.TestCase;
> > > >> > > >
> > > >> > > > public class TestCharset extends TestCase {
> > > >> > > >    public void test1() throws UnsupportedEncodingException {
> > > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > >> > > >        String s = new String(b, "UnicodeBig");
> > > >> > > >        assertEquals("abc", s);
> > > >> > > >    }
> > > >> > > >
> > > >> > > >    public void test2() {
> > > >> > > >        Charset.forName("UnicodeBig");
> > > >> > > >    }
> > > >> > > > }
> > > >> > > >
> > > >> > > > RI:
> > > >> > > > test1: junit.framework.ComparisonFailure: expected:<abc> but
> > > >> was:<>
> > > >> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > >> > > >
> > > >> > > > Harmony:
> > > >> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > >> > > > test2:
> > > >> > > > java.nio.charset.UnsupportedCharsetException: The unsupported
> > > >> charset
> > > >> > > > name is "UnicodeBig"
> > > >> > > >
> > > >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> > > >> j.l.String,
> > > >> > > > whereas Harmony does not support this alias at all.
> > > >> > > >
> > > >> > > > Do you have any concern about that?
> > > >> > > > --
> > > >> > > > Tony Wu
> > > >> > > > China Software Development Lab, IBM
> > > >> > > >
> > > >> > > >
> > > >> ---------------------------------------------------------------------
> > > >> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > >> > > > To unsubscribe, e-mail:
> > > >> harmony-dev-unsubscribe@incubator.apache.org
> > > >> > > > For additional commands, e-mail:
> > > >> harmony-dev-help@incubator.apache.org
> > > >> > > >
> > > >> > > >
> > > >> > >
> > > >> > >
> > > >> > > --
> > > >> > > Leo Li
> > > >> > > China Software Development Lab, IBM
> > > >> > >
> > > >> > >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Best regards,
> > > >> > Andrew Zhang
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Best regards,
> > > >> Andrew Zhang
> > > >>
> > > >>
> > > >
> > > >
> > >
> > >
> > > --
> > > Paulex Yang
> > > China Software Development Lab
> > > IBM
> > >
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> >
> > --
> > Tony Wu
> > China Software Development Lab, IBM
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Best regards,
> Andrew Zhang
>
>


-- 
Tony Wu
China Software Development Lab, IBM

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 10/19/06, Tony Wu <wu...@gmail.com> wrote:
>
> Thank you all,
> It is not just an issue about name.
> The precondition of mapping is that ICU has really supported this
> charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
> Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?


No, I don't think so. The only difference between "UnicodeBig" and
"UTF-16BE" is with/without byte-order mark. So it should be easy to wrap
"UTF-16BE"  as "UnicodeBig" for java.io/java.lang. Just put 0xFE 0xFF at the
begining of the bytes and then encode the buffer as "UTF-16BE". Do I miss
something?

[1]http://dev.icu-
> project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co
>
> [2]
> UTF-16
> Sixteen-bit UCS Transformation Format, byte order identified by an
> optional byte-order mark
> UnicodeBig
> Sixteen-bit Unicode Transformation Format, big-endian byte order,
> with byte-order mark
> UnicodeLittle
> Sixteen-bit Unicode Transformation Format, little-endian byte order,
> with byte-order mark
>
> On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> > Tony Wu wrote:
> > > Thank you Andrew,
> > > I think I got the point. The j.l.String of RI uses the encoding of IO
> > > whereas Charset.forName use another of NIO.
> > >
> > > And the new problem is shall we follow the spec[1] to support the two
> > > suites of charset implemetation? I just have a look and find we does
> > > not support some Canonical Name for java.io and java.lang API such as
> > > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> > There is such a charset name mapping in InputStreamReader, I think we
> > have no choice but to support these legacy charset names, you may need
> > some refactory work to make these classes use the same mapping data.
> > >
> > > [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > >
> > > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> > >> >
> > >> >
> > >> >
> > >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > >> > >
> > >> > > I think Harmony is more reasonable.
> > >> > >
> > >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > >> > > .UnsupportedCharsetException then no support for the named
> > >> charset is
> > >> > > available in this instance of the Java virtual machine. Then how
> > >> can we
> > >> > > get
> > >> > > new String(b, "UnicodeBig") without throwing
> > >> UnsupportedCharsetException
> > >> > > on
> > >> > > the same jvm? The spec for String(byte[] bytes,String
> > >> charsetName) also
> > >> > > says
> > >> > > if the named charset is not supported,
> UnsupportedCharsetException
> > >> > > should be
> > >> > > thrown out.
> > >> >
> > >> >
> > >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> > >> support such
> > >> > mapping in String and follow RI.
> > >> >
> > >>
> > >> You can find the encoding set from spec. [1]
> > >>
> > >> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> > >>
> > >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > >> > > >
> > >> > > > Hi all,
> > >> > > > I found this when I tried to debug the failure tests of ant on
> > >> > > > harmony. Note the output of testcases below.
> > >> > > >
> > >> > > > import java.io.UnsupportedEncodingException;
> > >> > > > import java.nio.charset.Charset ;
> > >> > > > import junit.framework.TestCase;
> > >> > > >
> > >> > > > public class TestCharset extends TestCase {
> > >> > > >    public void test1() throws UnsupportedEncodingException {
> > >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > >> > > >        String s = new String(b, "UnicodeBig");
> > >> > > >        assertEquals("abc", s);
> > >> > > >    }
> > >> > > >
> > >> > > >    public void test2() {
> > >> > > >        Charset.forName("UnicodeBig");
> > >> > > >    }
> > >> > > > }
> > >> > > >
> > >> > > > RI:
> > >> > > > test1: junit.framework.ComparisonFailure: expected:<abc> but
> > >> was:<>
> > >> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > >> > > >
> > >> > > > Harmony:
> > >> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > >> > > > test2:
> > >> > > > java.nio.charset.UnsupportedCharsetException: The unsupported
> > >> charset
> > >> > > > name is "UnicodeBig"
> > >> > > >
> > >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> > >> j.l.String,
> > >> > > > whereas Harmony does not support this alias at all.
> > >> > > >
> > >> > > > Do you have any concern about that?
> > >> > > > --
> > >> > > > Tony Wu
> > >> > > > China Software Development Lab, IBM
> > >> > > >
> > >> > > >
> > >> ---------------------------------------------------------------------
> > >> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > >> > > > To unsubscribe, e-mail:
> > >> harmony-dev-unsubscribe@incubator.apache.org
> > >> > > > For additional commands, e-mail:
> > >> harmony-dev-help@incubator.apache.org
> > >> > > >
> > >> > > >
> > >> > >
> > >> > >
> > >> > > --
> > >> > > Leo Li
> > >> > > China Software Development Lab, IBM
> > >> > >
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> > Best regards,
> > >> > Andrew Zhang
> > >>
> > >>
> > >>
> > >>
> > >> --
> > >> Best regards,
> > >> Andrew Zhang
> > >>
> > >>
> > >
> > >
> >
> >
> > --
> > Paulex Yang
> > China Software Development Lab
> > IBM
> >
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Tony Wu
> China Software Development Lab, IBM
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Tony Wu <wu...@gmail.com>.
Thank you all,
It is not just an issue about name.
The precondition of mapping is that ICU has really supported this
charset. AFAIK UnicodeBig is not implemented by ICU, refer to [1].
Shall we map the UnicodeBit&UnicodeLittle to UTF-16 as work around[2]?

[1]http://dev.icu-project.org/cgi-bin/viewcvs.cgi/icu/source/data/mappings/convrtrs.txt?view=co

[2]
UTF-16
 Sixteen-bit UCS Transformation Format, byte order identified by an
optional byte-order mark
UnicodeBig
 Sixteen-bit Unicode Transformation Format, big-endian byte order,
with byte-order mark
 UnicodeLittle
 Sixteen-bit Unicode Transformation Format, little-endian byte order,
with byte-order mark

On 10/17/06, Paulex Yang <pa...@gmail.com> wrote:
> Tony Wu wrote:
> > Thank you Andrew,
> > I think I got the point. The j.l.String of RI uses the encoding of IO
> > whereas Charset.forName use another of NIO.
> >
> > And the new problem is shall we follow the spec[1] to support the two
> > suites of charset implemetation? I just have a look and find we does
> > not support some Canonical Name for java.io and java.lang API such as
> > UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
> There is such a charset name mapping in InputStreamReader, I think we
> have no choice but to support these legacy charset names, you may need
> some refactory work to make these classes use the same mapping data.
> >
> > [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> >
> > On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> >> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> >> >
> >> >
> >> >
> >> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> >> > >
> >> > > I think Harmony is more reasonable.
> >> > >
> >> > > As spec says, if  Charset.forName("UnicodeBig") throws
> >> > > .UnsupportedCharsetException then no support for the named
> >> charset is
> >> > > available in this instance of the Java virtual machine. Then how
> >> can we
> >> > > get
> >> > > new String(b, "UnicodeBig") without throwing
> >> UnsupportedCharsetException
> >> > > on
> >> > > the same jvm? The spec for String(byte[] bytes,String
> >> charsetName) also
> >> > > says
> >> > > if the named charset is not supported, UnsupportedCharsetException
> >> > > should be
> >> > > thrown out.
> >> >
> >> >
> >> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better
> >> support such
> >> > mapping in String and follow RI.
> >> >
> >>
> >> You can find the encoding set from spec. [1]
> >>
> >> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
> >>
> >>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> >> > > >
> >> > > > Hi all,
> >> > > > I found this when I tried to debug the failure tests of ant on
> >> > > > harmony. Note the output of testcases below.
> >> > > >
> >> > > > import java.io.UnsupportedEncodingException;
> >> > > > import java.nio.charset.Charset ;
> >> > > > import junit.framework.TestCase;
> >> > > >
> >> > > > public class TestCharset extends TestCase {
> >> > > >    public void test1() throws UnsupportedEncodingException {
> >> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> >> > > >        String s = new String(b, "UnicodeBig");
> >> > > >        assertEquals("abc", s);
> >> > > >    }
> >> > > >
> >> > > >    public void test2() {
> >> > > >        Charset.forName("UnicodeBig");
> >> > > >    }
> >> > > > }
> >> > > >
> >> > > > RI:
> >> > > > test1: junit.framework.ComparisonFailure: expected:<abc> but
> >> was:<>
> >> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> >> > > >
> >> > > > Harmony:
> >> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> >> > > > test2:
> >> > > > java.nio.charset.UnsupportedCharsetException: The unsupported
> >> charset
> >> > > > name is "UnicodeBig"
> >> > > >
> >> > > > seems RI can recognize the *UnicodeBig* in Constructor of
> >> j.l.String,
> >> > > > whereas Harmony does not support this alias at all.
> >> > > >
> >> > > > Do you have any concern about that?
> >> > > > --
> >> > > > Tony Wu
> >> > > > China Software Development Lab, IBM
> >> > > >
> >> > > >
> >> ---------------------------------------------------------------------
> >> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> >> > > > To unsubscribe, e-mail:
> >> harmony-dev-unsubscribe@incubator.apache.org
> >> > > > For additional commands, e-mail:
> >> harmony-dev-help@incubator.apache.org
> >> > > >
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Leo Li
> >> > > China Software Development Lab, IBM
> >> > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Best regards,
> >> > Andrew Zhang
> >>
> >>
> >>
> >>
> >> --
> >> Best regards,
> >> Andrew Zhang
> >>
> >>
> >
> >
>
>
> --
> Paulex Yang
> China Software Development Lab
> IBM
>
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Tony Wu
China Software Development Lab, IBM

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Paulex Yang <pa...@gmail.com>.
Tony Wu wrote:
> Thank you Andrew,
> I think I got the point. The j.l.String of RI uses the encoding of IO
> whereas Charset.forName use another of NIO.
>
> And the new problem is shall we follow the spec[1] to support the two
> suites of charset implemetation? I just have a look and find we does
> not support some Canonical Name for java.io and java.lang API such as
> UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.
There is such a charset name mapping in InputStreamReader, I think we 
have no choice but to support these legacy charset names, you may need 
some refactory work to make these classes use the same mapping data.
>
> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
>
> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
>> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
>> >
>> >
>> >
>> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
>> > >
>> > > I think Harmony is more reasonable.
>> > >
>> > > As spec says, if  Charset.forName("UnicodeBig") throws
>> > > .UnsupportedCharsetException then no support for the named 
>> charset is
>> > > available in this instance of the Java virtual machine. Then how 
>> can we
>> > > get
>> > > new String(b, "UnicodeBig") without throwing 
>> UnsupportedCharsetException
>> > > on
>> > > the same jvm? The spec for String(byte[] bytes,String 
>> charsetName) also
>> > > says
>> > > if the named charset is not supported, UnsupportedCharsetException
>> > > should be
>> > > thrown out.
>> >
>> >
>> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better 
>> support such
>> > mapping in String and follow RI.
>> >
>>
>> You can find the encoding set from spec. [1]
>>
>> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
>>
>>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
>> > > >
>> > > > Hi all,
>> > > > I found this when I tried to debug the failure tests of ant on
>> > > > harmony. Note the output of testcases below.
>> > > >
>> > > > import java.io.UnsupportedEncodingException;
>> > > > import java.nio.charset.Charset ;
>> > > > import junit.framework.TestCase;
>> > > >
>> > > > public class TestCharset extends TestCase {
>> > > >    public void test1() throws UnsupportedEncodingException {
>> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
>> > > >        String s = new String(b, "UnicodeBig");
>> > > >        assertEquals("abc", s);
>> > > >    }
>> > > >
>> > > >    public void test2() {
>> > > >        Charset.forName("UnicodeBig");
>> > > >    }
>> > > > }
>> > > >
>> > > > RI:
>> > > > test1: junit.framework.ComparisonFailure: expected:<abc> but 
>> was:<>
>> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
>> > > >
>> > > > Harmony:
>> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
>> > > > test2:
>> > > > java.nio.charset.UnsupportedCharsetException: The unsupported 
>> charset
>> > > > name is "UnicodeBig"
>> > > >
>> > > > seems RI can recognize the *UnicodeBig* in Constructor of 
>> j.l.String,
>> > > > whereas Harmony does not support this alias at all.
>> > > >
>> > > > Do you have any concern about that?
>> > > > --
>> > > > Tony Wu
>> > > > China Software Development Lab, IBM
>> > > >
>> > > > 
>> ---------------------------------------------------------------------
>> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
>> > > > To unsubscribe, e-mail: 
>> harmony-dev-unsubscribe@incubator.apache.org
>> > > > For additional commands, e-mail: 
>> harmony-dev-help@incubator.apache.org
>> > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Leo Li
>> > > China Software Development Lab, IBM
>> > >
>> > >
>> >
>> >
>> > --
>> > Best regards,
>> > Andrew Zhang
>>
>>
>>
>>
>> -- 
>> Best regards,
>> Andrew Zhang
>>
>>
>
>


-- 
Paulex Yang
China Software Development Lab
IBM


---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Tony Wu <wu...@gmail.com>.
Thank you Andrew,
I think I got the point. The j.l.String of RI uses the encoding of IO
whereas Charset.forName use another of NIO.

And the new problem is shall we follow the spec[1] to support the two
suites of charset implemetation? I just have a look and find we does
not support some Canonical Name for java.io and java.lang API such as
UnicodeBigUnmarked,UnicodeLittleUnmarked,UnicodeBig,Unicodelittle,etc.

[1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html

On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
> >
> >
> >
> > On 10/17/06, Leo Li <li...@gmail.com> wrote:
> > >
> > > I think Harmony is more reasonable.
> > >
> > > As spec says, if  Charset.forName("UnicodeBig") throws
> > > .UnsupportedCharsetException then no support for the named charset is
> > > available in this instance of the Java virtual machine. Then how can we
> > > get
> > > new String(b, "UnicodeBig") without throwing UnsupportedCharsetException
> > > on
> > > the same jvm? The spec for String(byte[] bytes,String charsetName) also
> > > says
> > > if the named charset is not supported, UnsupportedCharsetException
> > > should be
> > > thrown out.
> >
> >
> > UNICODEBIG is a java alias for UTF-16BE. I think we'd better support such
> > mapping in String and follow RI.
> >
>
> You can find the encoding set from spec. [1]
>
> [1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
>
>  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > > >
> > > > Hi all,
> > > > I found this when I tried to debug the failure tests of ant on
> > > > harmony. Note the output of testcases below.
> > > >
> > > > import java.io.UnsupportedEncodingException;
> > > > import java.nio.charset.Charset ;
> > > > import junit.framework.TestCase;
> > > >
> > > > public class TestCharset extends TestCase {
> > > >    public void test1() throws UnsupportedEncodingException {
> > > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > > >        String s = new String(b, "UnicodeBig");
> > > >        assertEquals("abc", s);
> > > >    }
> > > >
> > > >    public void test2() {
> > > >        Charset.forName("UnicodeBig");
> > > >    }
> > > > }
> > > >
> > > > RI:
> > > > test1: junit.framework.ComparisonFailure: expected:<abc> but was:<>
> > > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > >
> > > > Harmony:
> > > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > > test2:
> > > > java.nio.charset.UnsupportedCharsetException: The unsupported charset
> > > > name is "UnicodeBig"
> > > >
> > > > seems RI can recognize the *UnicodeBig* in Constructor of j.l.String,
> > > > whereas Harmony does not support this alias at all.
> > > >
> > > > Do you have any concern about that?
> > > > --
> > > > Tony Wu
> > > > China Software Development Lab, IBM
> > > >
> > > > ---------------------------------------------------------------------
> > > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > > >
> > > >
> > >
> > >
> > > --
> > > Leo Li
> > > China Software Development Lab, IBM
> > >
> > >
> >
> >
> > --
> > Best regards,
> > Andrew Zhang
>
>
>
>
> --
> Best regards,
> Andrew Zhang
>
>


-- 
Tony Wu
China Software Development Lab, IBM

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org


Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 10/17/06, Andrew Zhang <zh...@gmail.com> wrote:
>
>
>
> On 10/17/06, Leo Li <li...@gmail.com> wrote:
> >
> > I think Harmony is more reasonable.
> >
> > As spec says, if  Charset.forName("UnicodeBig") throws
> > .UnsupportedCharsetException then no support for the named charset is
> > available in this instance of the Java virtual machine. Then how can we
> > get
> > new String(b, "UnicodeBig") without throwing UnsupportedCharsetException
> > on
> > the same jvm? The spec for String(byte[] bytes,String charsetName) also
> > says
> > if the named charset is not supported, UnsupportedCharsetException
> > should be
> > thrown out.
>
>
> UNICODEBIG is a java alias for UTF-16BE. I think we'd better support such
> mapping in String and follow RI.
>

You can find the encoding set from spec. [1]

[1] http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html

  On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> > >
> > > Hi all,
> > > I found this when I tried to debug the failure tests of ant on
> > > harmony. Note the output of testcases below.
> > >
> > > import java.io.UnsupportedEncodingException;
> > > import java.nio.charset.Charset ;
> > > import junit.framework.TestCase;
> > >
> > > public class TestCharset extends TestCase {
> > >    public void test1() throws UnsupportedEncodingException {
> > >        byte[] b = new byte[] { 'a', 'b', 'c' };
> > >        String s = new String(b, "UnicodeBig");
> > >        assertEquals("abc", s);
> > >    }
> > >
> > >    public void test2() {
> > >        Charset.forName("UnicodeBig");
> > >    }
> > > }
> > >
> > > RI:
> > > test1: junit.framework.ComparisonFailure: expected:<abc> but was:<>
> > > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > >
> > > Harmony:
> > > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > > test2:
> > > java.nio.charset.UnsupportedCharsetException: The unsupported charset
> > > name is "UnicodeBig"
> > >
> > > seems RI can recognize the *UnicodeBig* in Constructor of j.l.String,
> > > whereas Harmony does not support this alias at all.
> > >
> > > Do you have any concern about that?
> > > --
> > > Tony Wu
> > > China Software Development Lab, IBM
> > >
> > > ---------------------------------------------------------------------
> > > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> > >
> > >
> >
> >
> > --
> > Leo Li
> > China Software Development Lab, IBM
> >
> >
>
>
> --
> Best regards,
> Andrew Zhang




-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Andrew Zhang <zh...@gmail.com>.
On 10/17/06, Leo Li <li...@gmail.com> wrote:
>
> I think Harmony is more reasonable.
>
> As spec says, if  Charset.forName("UnicodeBig") throws
> .UnsupportedCharsetException then no support for the named charset is
> available in this instance of the Java virtual machine. Then how can we
> get
> new String(b, "UnicodeBig") without throwing UnsupportedCharsetException
> on
> the same jvm? The spec for String(byte[] bytes,String charsetName) also
> says
> if the named charset is not supported, UnsupportedCharsetException should
> be
> thrown out.


UNICODEBIG is a java alias for UTF-16BE. I think we'd better support such
mapping in String and follow RI.

On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
> >
> > Hi all,
> > I found this when I tried to debug the failure tests of ant on
> > harmony. Note the output of testcases below.
> >
> > import java.io.UnsupportedEncodingException;
> > import java.nio.charset.Charset;
> > import junit.framework.TestCase;
> >
> > public class TestCharset extends TestCase {
> >    public void test1() throws UnsupportedEncodingException {
> >        byte[] b = new byte[] { 'a', 'b', 'c' };
> >        String s = new String(b, "UnicodeBig");
> >        assertEquals("abc", s);
> >    }
> >
> >    public void test2() {
> >        Charset.forName("UnicodeBig");
> >    }
> > }
> >
> > RI:
> > test1: junit.framework.ComparisonFailure: expected:<abc> but was:<>
> > test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
> >
> > Harmony:
> > test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> > test2:
> > java.nio.charset.UnsupportedCharsetException: The unsupported charset
> > name is "UnicodeBig"
> >
> > seems RI can recognize the *UnicodeBig* in Constructor of j.l.String,
> > whereas Harmony does not support this alias at all.
> >
> > Do you have any concern about that?
> > --
> > Tony Wu
> > China Software Development Lab, IBM
> >
> > ---------------------------------------------------------------------
> > Terms of use : http://incubator.apache.org/harmony/mailing.html
> > To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: harmony-dev-help@incubator.apache.org
> >
> >
>
>
> --
> Leo Li
> China Software Development Lab, IBM
>
>


-- 
Best regards,
Andrew Zhang

Re: [classlib][luni][charset]Strange behavior of UnicodeBig

Posted by Leo Li <li...@gmail.com>.
I think Harmony is more reasonable.

As spec says, if  Charset.forName("UnicodeBig") throws
.UnsupportedCharsetException then no support for the named charset is
available in this instance of the Java virtual machine. Then how can we get
new String(b, "UnicodeBig") without throwing UnsupportedCharsetException on
the same jvm? The spec for String(byte[] bytes,String charsetName) also says
if the named charset is not supported, UnsupportedCharsetException should be
thrown out.



On 10/17/06, Tony Wu <wu...@gmail.com> wrote:
>
> Hi all,
> I found this when I tried to debug the failure tests of ant on
> harmony. Note the output of testcases below.
>
> import java.io.UnsupportedEncodingException;
> import java.nio.charset.Charset;
> import junit.framework.TestCase;
>
> public class TestCharset extends TestCase {
>    public void test1() throws UnsupportedEncodingException {
>        byte[] b = new byte[] { 'a', 'b', 'c' };
>        String s = new String(b, "UnicodeBig");
>        assertEquals("abc", s);
>    }
>
>    public void test2() {
>        Charset.forName("UnicodeBig");
>    }
> }
>
> RI:
> test1: junit.framework.ComparisonFailure: expected:<abc> but was:<>
> test2: java.nio.charset.UnsupportedCharsetException: UnicodeBig
>
> Harmony:
> test1:java.nio.charset.UnsupportedCharsetException: UnicodeBig
> test2:
> java.nio.charset.UnsupportedCharsetException: The unsupported charset
> name is "UnicodeBig"
>
> seems RI can recognize the *UnicodeBig* in Constructor of j.l.String,
> whereas Harmony does not support this alias at all.
>
> Do you have any concern about that?
> --
> Tony Wu
> China Software Development Lab, IBM
>
> ---------------------------------------------------------------------
> Terms of use : http://incubator.apache.org/harmony/mailing.html
> To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
> For additional commands, e-mail: harmony-dev-help@incubator.apache.org
>
>


-- 
Leo Li
China Software Development Lab, IBM