You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@harmony.apache.org by "Dmitry M. Kononov (JIRA)" <ji...@apache.org> on 2006/04/05 12:28:44 UTC

[jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

     [ http://issues.apache.org/jira/browse/HARMONY-308?page=all ]

Dmitry M. Kononov updated HARMONY-308:
--------------------------------------

    Attachment: test9.java

> java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
> -----------------------------------------------------------------------------------------------------------------------
>
>          Key: HARMONY-308
>          URL: http://issues.apache.org/jira/browse/HARMONY-308
>      Project: Harmony
>         Type: Bug

>   Components: Classlib
>     Reporter: Dmitry M. Kononov
>  Attachments: test9.java
>
> java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order.
> Please look at the output of a test case that I am going to attach.
> RI:
> ---8<---
> bb.order()=BE
> cb.order()=LE
> result.order()=BE
> The result is
> result = java.nio.HeapByteBuffer[pos=0 lim=28 cap=52]
> bb = java.nio.HeapByteBuffer[pos=0 lim=28 cap=28]
> The result is OK.
> ---8<---
> Harmony (At revision 391577):
> ---8<---
> bb.order()=BE
> cb.order()=LE
> result.order()=BE
> The result is
> result = java.nio.ReadWriteHeapByteBuffer, status: capacity=28 position=0 limit=28
> bb = java.nio.ReadWriteHeapByteBuffer, status: capacity=28 position=0 limit=28
> The result is not correct.
> 0 elements are not equal (ffffffff != fffffffe)
> 1 elements are not equal (fffffffe != ffffffff)
> 2 elements are not equal (1b != 4)
> 3 elements are not equal (4 != 1b)
> 4 elements are not equal (35 != 4)
> 5 elements are not equal (4 != 35)
> 6 elements are not equal (42 != 4)
> 7 elements are not equal (4 != 42)
> 8 elements are not equal (3e != 4)
> 9 elements are not equal (4 != 3e)
> 10 elements are not equal (20 != 0)
> 11 elements are not equal (0 != 20)
> 12 elements are not equal (32 != 4)
> 13 elements are not equal (4 != 32)
> 14 elements are not equal (20 != 0)
> 15 elements are not equal (0 != 20)
> 16 elements are not equal (20 != 4)
> 17 elements are not equal (4 != 20)
> 18 elements are not equal (3e != 4)
> 19 elements are not equal (4 != 3e)
> 20 elements are not equal (41 != 4)
> 21 elements are not equal (4 != 41)
> 22 elements are not equal (41 != 4)
> 23 elements are not equal (4 != 41)
> 24 elements are not equal (38 != 4)
> 25 elements are not equal (4 != 38)
> 26 elements are not equal (38 != 4)
> 27 elements are not equal (4 != 38)
> ---8<---

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by Richard Liang <ri...@gmail.com>.


Dmitry M. Kononov wrote:
> Hi Richard,
>
> On 4/7/06, Richard Liang <ri...@gmail.com> wrote:
>> You're right. :-) Now I agree with you that Harmony is not compliant
>> with the specification. We will discuss with our Charset Provider - ICU
>> to determine how to fix this issue. Thanks a lot.
>
> Is there any progress?
>
Hello Dmitry,

I reported bug for ICU[1], but unfortunately, I have not received any 
feedback :-(

[1] http://bugs.icu-project.org/cgi-bin/icu-bugs?findid=5179&go=Go
> Thanks.

-- 
Richard Liang
China Software Development Lab, IBM 



---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by "Dmitry M. Kononov" <dm...@gmail.com>.

Hi Richard,

On 4/7/06, Richard Liang <ri...@gmail.com> wrote:
> You're Dmitry. :-) Now I agree with you that Harmony is not compliant
> with the specification. We will discuss with our Charset Provider - ICU
> to determine how to fix this issue. Thanks a lot.

Is there any progress?

Thanks.
-- 
Dmitry M. Kononov
Intel Managed Runtime Division

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by Richard Liang <ri...@gmail.com>.

Dmitry M. Kononov wrote:
> Hi Richard,
>
> On 4/6/06, Richard Liang <ri...@gmail.com> wrote:
>
>   
>> Dmitry M. Kononov wrote:
>>     
>>> As you exactly noticed the cause of this issue that Harmony uses the
>>> little-endian byte order, if an encoded UTF-16 sequence has no
>>>       
>> byte-order
>>     
>>> mark. However, the spec reads such a case explicitly as follows:
>>>
>>> "When decoding, the UTF-16 charset interprets a byte-order mark to
>>>       
>> indicate
>>     
>>> the byte order of the stream but defaults to big-endian if there is no
>>> byte-order mark; when encoding, it uses big-endian byte order and writes
>>>       
>> a
>>     
>>> big-endian byte-order mark."
>>>
>>>
>>>       
>> Hello Dmitry,
>>
>> Yes, although Harmony and RI use different byte order, as both Harmony
>> and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
>> compliant with the specification. So could we regard Harmony-308 as "not
>> a bug"?
>>     
>
>
> I think Harmony's behavior in this case is inconsistent with the java spec,
> since the spec defines the expected behavior explicitly:
> "when encoding, it uses big-endian byte order and writes a big-endian
> byte-order mark." But Harmony's encode() returns bytes in the little-endian
> order.
>
> It seems I do not understand why do you think Harmony follows the spec
> correctly in this case? :)
> I am really sorry for my misunderstanding.
>
>   
You're Dmitry. :-) Now I agree with you that Harmony is not compliant 
with the specification. We will discuss with our Charset Provider - ICU 
to determine how to fix this issue. Thanks a lot.

> >From a test case attached to the HARMONY-308:
>
> 1) We have a char array that has no byte-order mark:
>     private static final char chars[] = {
>
> 0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
>         0x0441,0x0438,0x0438};
>
> 2) We have a byte array that encode() should return as we expect.
>     private static final byte bytes[] = {
>         (byte)254,(byte)255,(byte)  4,(byte) 27,(byte)  4,(byte) 53,(byte)
> 4,
>         (byte) 66,(byte)  4,(byte) 62,(byte)  0,(byte) 32,(byte)  4,(byte)
> 50,
>         (byte)  0,(byte) 32,(byte)  4,(byte) 32,(byte)  4,(byte) 62,(byte)
> 4,
>         (byte) 65,(byte)  4,(byte) 65,(byte)  4,(byte) 56,(byte)  4,(byte)
> 56};
>
> Please note, according to the spec we expect bytes returned by encode() in
> big-endian byte order. So, we expect the FEFF byte-order mark.
> Do you agree this expectation is correct and consistent with the spec?
>
> Thanks.
> --
> Dmitry M. Kononov
> Intel Managed Runtime Division
>
>   


-- 
Richard Liang
China Software Development Lab, IBM

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by Tim Ellison <t....@gmail.com>.

Just to clarify...

Dmitry M. Kononov wrote:
<snip>
> Harmony and IBM jdk1.4.2 use the ICU to provide
> java.nio.charsetfunctionality. So, they have the same behavior in our
> case. This behavior
> does not follow the java documentation (or I something don't understand :)
> ).

No, IBM's JDK 1.4.2 does not use the same ICU code as Harmony.

(The precise relationship is complex, but the summary is that a long
time ago some ICU code was adopted into the Sun code base.  Harmony is
using a significantly different release.  I believe both use data from
the CLDR.)

Regards,
Tim

> Thus, we probably need to ask about fixing the ICU, don't we?
> 
> What do you think, does it make sense to file a bug against ICU?
> Thanks.
> --
> Dmitry M. Kononov
> Intel Managed Runtime Division
> 

-- 

Tim Ellison (t.p.ellison@gmail.com)
IBM Java technology centre, UK.

---------------------------------------------------------------------
Terms of use : http://incubator.apache.org/harmony/mailing.html
To unsubscribe, e-mail: harmony-dev-unsubscribe@incubator.apache.org
For additional commands, e-mail: harmony-dev-help@incubator.apache.org

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by "Dmitry M. Kononov" <dm...@gmail.com>.

Hi Andrew,

On 4/7/06, Andrew Zhang <zh...@gmail.com> wrote:
>
> Hello, Dmirty,
>
> I agree with you that Harmony's behavior is not consistent with java spec.

:)

As you may know, java.nio.charset.Charset wraps ICU to implement
> encode/decode operations.
>
> The following description is cited from ICU: (
> http://icu.sourceforge.net/userguide/unicodeBasics.html)
>
> *The names "UTF-16" and "UTF-32" are ambiguous. Depending on context, they
> refer either to character encoding forms where 16/32-bit words are
> processed
> and are naturally stored in the platform endianness, or they refer to the
> IANA-registered charset names, i.e., to character encoding schemes or byte
> serializations. In addition to simple byte serialization, the charsets
> with
> these names also use optional Byte Order Marks (see **Serialized
> Formats*<
> http://icu.sourceforge.net/userguide/unicodeBasics.html#serialized_formats
> >
> * below).*
>
> Thanks, it's a good point. However, I found the following text in this
document that let us think that there is a bug in ICU. Please note the
latest sentence, that describes our case exactly, I believe:

"In UTF-16 and UTF-32, where the signature also distinguishes between
big-endian and little-endian byte orders, it is also called a byte order
mark (BOM). The signature works for UTF-16 since the code point that has the
byte-swapped encoding, FFFE16, will never be a valid Unicode character. (It
is a "non-character" code point.) In Internet protocols, if an encoding
specification of "UTF-16" or "UTF-32" is used, it is expected that there is
a signature byte sequence (BOM) that identifies the byte ordering, which is
not the case for the encoding scheme/charset names with "BE" or "LE".
If text is specified to be encoded in the UTF-16 or UTF-32 charset and does
not begin with a BOM, then it must be interpreted as UTF-16BE or UTF-32BE,
respectively."

Harmony and IBM jdk1.4.2 use the ICU to provide
java.nio.charsetfunctionality. So, they have the same behavior in our
case. This behavior
does not follow the java documentation (or I something don't understand :)
). Thus, we probably need to ask about fixing the ICU, don't we?

What do you think, does it make sense to file a bug against ICU?
Thanks.
--
Dmitry M. Kononov
Intel Managed Runtime Division

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by Andrew Zhang <zh...@gmail.com>.

Hello, Dmirty,

I agree with you that Harmony's behavior is not consistent with java spec.

As you may know, java.nio.charset.Charset wraps ICU to implement
encode/decode operations.

The following description is cited from ICU: (
http://icu.sourceforge.net/userguide/unicodeBasics.html)

*The names "UTF-16" and "UTF-32" are ambiguous. Depending on context, they
refer either to character encoding forms where 16/32-bit words are processed
and are naturally stored in the platform endianness, or they refer to the
IANA-registered charset names, i.e., to character encoding schemes or byte
serializations. In addition to simple byte serialization, the charsets with
these names also use optional Byte Order Marks (see **Serialized
Formats*<http://icu.sourceforge.net/userguide/unicodeBasics.html#serialized_formats>
* below).*

The result of running your test case on IBM jdk 1.4.2  is exactly the same
as on Harmony. I guess IBM jdk 1.4.2 has passed TCK.

Therefore, IMO, both behaviours are acceptable.

What's your opinion?

On 4/7/06, Dmitry M. Kononov <dm...@gmail.com> wrote:
>
> Hi Richard,
>
> On 4/6/06, Richard Liang <ri...@gmail.com> wrote:
>
> > Dmitry M. Kononov wrote:
> > > As you exactly noticed the cause of this issue that Harmony uses the
> > > little-endian byte order, if an encoded UTF-16 sequence has no
> > byte-order
> > > mark. However, the spec reads such a case explicitly as follows:
> > >
> > > "When decoding, the UTF-16 charset interprets a byte-order mark to
> > indicate
> > > the byte order of the stream but defaults to big-endian if there is no
> > > byte-order mark; when encoding, it uses big-endian byte order and
> writes
> > a
> > > big-endian byte-order mark."
> > >
> > >
> > Hello Dmitry,
> >
> > Yes, although Harmony and RI use different byte order, as both Harmony
> > and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
> > compliant with the specification. So could we regard Harmony-308 as "not
> > a bug"?
>
>
> I think Harmony's behavior in this case is inconsistent with the java
> spec,
> since the spec defines the expected behavior explicitly:
> "when encoding, it uses big-endian byte order and writes a big-endian
> byte-order mark." But Harmony's encode() returns bytes in the
> little-endian
> order.
>
> It seems I do not understand why do you think Harmony follows the spec
> correctly in this case? :)
> I am really sorry for my misunderstanding.
>
> From a test case attached to the HARMONY-308:
>
> 1) We have a char array that has no byte-order mark:
>    private static final char chars[] = {
>
> 0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
>        0x0441,0x0438,0x0438};
>
> 2) We have a byte array that encode() should return as we expect.
>    private static final byte bytes[] = {
>        (byte)254,(byte)255,(byte)  4,(byte) 27,(byte)  4,(byte) 53,(byte)
> 4,
>        (byte) 66,(byte)  4,(byte) 62,(byte)  0,(byte) 32,(byte)  4,(byte)
> 50,
>        (byte)  0,(byte) 32,(byte)  4,(byte) 32,(byte)  4,(byte) 62,(byte)
> 4,
>        (byte) 65,(byte)  4,(byte) 65,(byte)  4,(byte) 56,(byte)  4,(byte)
> 56};
>
> Please note, according to the spec we expect bytes returned by encode() in
> big-endian byte order. So, we expect the FEFF byte-order mark.
> Do you agree this expectation is correct and consistent with the spec?
>
> Thanks.
> --
> Dmitry M. Kononov
> Intel Managed Runtime Division
>
>
--
Andrew Zhang
China Software Development Lab, IBM

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by "Dmitry M. Kononov" <dm...@gmail.com>.

Hi Richard,

On 4/6/06, Richard Liang <ri...@gmail.com> wrote:

> Dmitry M. Kononov wrote:
> > As you exactly noticed the cause of this issue that Harmony uses the
> > little-endian byte order, if an encoded UTF-16 sequence has no
> byte-order
> > mark. However, the spec reads such a case explicitly as follows:
> >
> > "When decoding, the UTF-16 charset interprets a byte-order mark to
> indicate
> > the byte order of the stream but defaults to big-endian if there is no
> > byte-order mark; when encoding, it uses big-endian byte order and writes
> a
> > big-endian byte-order mark."
> >
> >
> Hello Dmitry,
>
> Yes, although Harmony and RI use different byte order, as both Harmony
> and RI use byte-order mark (U+FEFF), I think both Harmony and RI are
> compliant with the specification. So could we regard Harmony-308 as "not
> a bug"?

I think Harmony's behavior in this case is inconsistent with the java spec,
since the spec defines the expected behavior explicitly:
"when encoding, it uses big-endian byte order and writes a big-endian
byte-order mark." But Harmony's encode() returns bytes in the little-endian
order.

It seems I do not understand why do you think Harmony follows the spec
correctly in this case? :)
I am really sorry for my misunderstanding.

>From a test case attached to the HARMONY-308:

1) We have a char array that has no byte-order mark:
    private static final char chars[] = {

0x041b,0x0435,0x0442,0x043e,0x0020,0x0432,0x0020,0x0420,0x043e,0x0441,
        0x0441,0x0438,0x0438};

2) We have a byte array that encode() should return as we expect.
    private static final byte bytes[] = {
        (byte)254,(byte)255,(byte)  4,(byte) 27,(byte)  4,(byte) 53,(byte)
4,
        (byte) 66,(byte)  4,(byte) 62,(byte)  0,(byte) 32,(byte)  4,(byte)
50,
        (byte)  0,(byte) 32,(byte)  4,(byte) 32,(byte)  4,(byte) 62,(byte)
4,
        (byte) 65,(byte)  4,(byte) 65,(byte)  4,(byte) 56,(byte)  4,(byte)
56};

Please note, according to the spec we expect bytes returned by encode() in
big-endian byte order. So, we expect the FEFF byte-order mark.
Do you agree this expectation is correct and consistent with the spec?

Thanks.
--
Dmitry M. Kononov
Intel Managed Runtime Division

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by Richard Liang <ri...@gmail.com>.

Dmitry M. Kononov wrote:
> Hi Richard,
>
> On 4/6/06, Richard Liang <ri...@gmail.com> wrote:
>   
>> And as described in Unioccde, UTF-16 can be encoded as either big endian
>> or little endian, but a leading byte sequence corresponding to U+FEFF
>> will be used to distinguish the two byte orders.
>>
>> If the leading byte sequence is FE FF, the whole byte sequence will be
>> regarded as big-endian
>> If the leading byte sequence is FF FE, the whole byte sequence will be
>> regarded as little-endian.
>>
>> From your test, we can see Harmony use little-endian, while RI use
>> big-endian.
>>
>> I'm sorry if my explanation make you confused :-)
>>     
>
>
> I absolutely agreed with you. Thanks a lot for your explanation and sorry
> for my brief description of the issue.
>
> As you exactly noticed the cause of this issue that Harmony uses the
> little-endian byte order, if an encoded UTF-16 sequence has no byte-order
> mark. However, the spec reads such a case explicitly as follows:
>
> "When decoding, the UTF-16 charset interprets a byte-order mark to indicate
> the byte order of the stream but defaults to big-endian if there is no
> byte-order mark; when encoding, it uses big-endian byte order and writes a
> big-endian byte-order mark."
>
>   
Hello Dmitry,

Yes, although Harmony and RI use different byte order, as both Harmony 
and RI use byte-order mark (U+FEFF), I think both Harmony and RI are 
compliant with the specification. So could we regard Harmony-308 as "not 
a bug"?
> Thanks.
>
>   
>> --
>> Dmitry M. Kononov
>> Intel Managed Runtime Division
>>     
>
>   


-- 
Richard Liang
China Software Development Lab, IBM

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by "Dmitry M. Kononov" <dm...@gmail.com>.

Hi Richard,

On 4/6/06, Richard Liang <ri...@gmail.com> wrote:
>
> And as described in Unioccde, UTF-16 can be encoded as either big endian
> or little endian, but a leading byte sequence corresponding to U+FEFF
> will be used to distinguish the two byte orders.
>
> If the leading byte sequence is FE FF, the whole byte sequence will be
> regarded as big-endian
> If the leading byte sequence is FF FE, the whole byte sequence will be
> regarded as little-endian.
>
> From your test, we can see Harmony use little-endian, while RI use
> big-endian.
>
> I'm sorry if my explanation make you confused :-)


I absolutely agreed with you. Thanks a lot for your explanation and sorry
for my brief description of the issue.

As you exactly noticed the cause of this issue that Harmony uses the
little-endian byte order, if an encoded UTF-16 sequence has no byte-order
mark. However, the spec reads such a case explicitly as follows:

"When decoding, the UTF-16 charset interprets a byte-order mark to indicate
the byte order of the stream but defaults to big-endian if there is no
byte-order mark; when encoding, it uses big-endian byte order and writes a
big-endian byte-order mark."

Thanks.

> --
> Dmitry M. Kononov
> Intel Managed Runtime Division

Re: [jira] Updated: (HARMONY-308) java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset

Posted by Richard Liang <ri...@gmail.com>.

Dmitry M. Kononov (JIRA) wrote:
>      [ http://issues.apache.org/jira/browse/HARMONY-308?page=all ]
>
> Dmitry M. Kononov updated HARMONY-308:
> --------------------------------------
>
>     Attachment: test9.java
>
>   
>> java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order in Harmony and RI for the UTF-16 charset
>> -----------------------------------------------------------------------------------------------------------------------
>>
>>          Key: HARMONY-308
>>          URL: http://issues.apache.org/jira/browse/HARMONY-308
>>      Project: Harmony
>>         Type: Bug
>>     
>
>   
>>   Components: Classlib
>>     Reporter: Dmitry M. Kononov
>>  Attachments: test9.java
>>
>> java.nio.charset.Charset.encode(CharBuffer) returns bytes in a different order.
>> Please look at the output of a test case that I am going to attach.
>> RI:
>> ---8<---
>> bb.order()=BE
>> cb.order()=LE
>> result.order()=BE
>> The result is
>> result = java.nio.HeapByteBuffer[pos=0 lim=28 cap=52]
>> bb = java.nio.HeapByteBuffer[pos=0 lim=28 cap=28]
>> The result is OK.
>> ---8<---
>> Harmony (At revision 391577):
>> ---8<---
>> bb.order()=BE
>> cb.order()=LE
>> result.order()=BE
>> The result is
>> result = java.nio.ReadWriteHeapByteBuffer, status: capacity=28 position=0 limit=28
>> bb = java.nio.ReadWriteHeapByteBuffer, status: capacity=28 position=0 limit=28
>> The result is not correct.
>> 0 elements are not equal (ffffffff != fffffffe)
>> 1 elements are not equal (fffffffe != ffffffff)
>> 2 elements are not equal (1b != 4)
>> 3 elements are not equal (4 != 1b)
>> 4 elements are not equal (35 != 4)
>> 5 elements are not equal (4 != 35)
>> 6 elements are not equal (42 != 4)
>> 7 elements are not equal (4 != 42)
>> 8 elements are not equal (3e != 4)
>> 9 elements are not equal (4 != 3e)
>> 10 elements are not equal (20 != 0)
>> 11 elements are not equal (0 != 20)
>> 12 elements are not equal (32 != 4)
>> 13 elements are not equal (4 != 32)
>> 14 elements are not equal (20 != 0)
>> 15 elements are not equal (0 != 20)
>> 16 elements are not equal (20 != 4)
>> 17 elements are not equal (4 != 20)
>> 18 elements are not equal (3e != 4)
>> 19 elements are not equal (4 != 3e)
>> 20 elements are not equal (41 != 4)
>> 21 elements are not equal (4 != 41)
>> 22 elements are not equal (41 != 4)
>> 23 elements are not equal (4 != 41)
>> 24 elements are not equal (38 != 4)
>> 25 elements are not equal (4 != 38)
>> 26 elements are not equal (38 != 4)
>> 27 elements are not equal (4 != 38)
>> ---8<---
>>     
>
>   
Hello Dmitry,

IMHO, you may mix up the two "byte order" concepts :-)

1. the byte order of ByteBuffer (ByteBuffer.order)
2. the byte order of byte sequences encoded by some CharsetEncoder, such 
as UTF-16

First, let's see the byte order for java.nio.ByteBuffer.

As described in the spec of java.nio.ByteBuffer:

This class defines six categories of operations upon byte buffers:
....
* Absolute and relative *get* and *put *methods that read and write 
values of other primitive types, translating them to and from sequences 
of bytes in a particular byte order;
.....

For example,
        ByteBuffer bb = ByteBuffer.allocate(10);
        bb.order(ByteOrder.LITTLE_ENDIAN);
        bb.putChar('A');
The bytes stored in the ByteBuffer will be: 41 00

        ByteBuffer bb = ByteBuffer.allocate(10);
        bb.order(ByteOrder.BIG_ENDIAN);
        bb.putChar('A');
The bytes stored in the ByteBuffer will be: 00 41

Second, there are also byte order issues in some character encoding 
schemes, such as UTF-16, UTF-16LE and UTF-16BE.

For example,

Character 'A' can be encoded in UTF-16LE: 41 00
Character 'A' can be encoded in UTF-16BE: 00 41

If we use the APIs java.nio.charset, the encoded byte sequences will be 
saved into a ByteBuffer. But **here** the ByteBuffer.order has no 
relationship to the encoded byte sequences. A UTF-16LE encoded byte 
sequence can still be stored into a BIG_ENDIAN ordered ByteBuffer.

And as described in Unioccde, UTF-16 can be encoded as either big endian 
or little endian, but a leading byte sequence corresponding to U+FEFF 
will be used to distinguish the two byte orders.

If the leading byte sequence is FE FF, the whole byte sequence will be 
regarded as big-endian
If the leading byte sequence is FF FE, the whole byte sequence will be 
regarded as little-endian.

 From your test, we can see Harmony use little-endian, while RI use 
big-endian.

I'm sorry if my explanation make you confused :-)

-- 
Richard Liang
China Software Development Lab, IBM