You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Paulex Yang (JIRA)" <ji...@apache.org> on 2007/08/15 09:55:32 UTC
[jira] Commented: (HARMONY-4196) [classlib][luni] InputStreamReader can't handle UnicodeBig encoding

    [ https://issues.apache.org/jira/browse/HARMONY-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12519889 ] 

Paulex Yang commented on HARMONY-4196:
--------------------------------------

It's yet another historical/canonical encoding issue in Java platform,
java.io/lang has old/non-standard canonical name with Unicode as well as
java.nio, here's a link on the mapping for Java SE 5:[1] , and here's for
Java SE 6:[2]

The difference between "UnicodeBIg" and "UnicodeBigUnmarked"(i.e., UTF-16BE)
is, according to the explanation on the tables[1][2], is the UnicodeBig has
BOM("0xFEFF" for big endian). The difference applies to UnicodeLittle and
UnicodeLittleUnmarked, too.

My suggestion is to just map the "UnicodeBig" and "UnicodeLittle" to
"utf-16" in InputStreamReader and OutputStreamWriter's constructors, because
utf-16 can recognize the BOM and adapt to the byte stream accordingly. We
may also need to map other java.io canonical name to java.nio name(currently
there's only a reverse map for this) accordingly.  I haven't tested if it is
necessarythough.

[1]http://java.sun.com/j2se/1.5.0/docs/guide/intl/encoding.doc.html
[2]http://java.sun.com/javase/6/docs/technotes/guides/intl/encoding.doc.html.



-- 
Paulex Yang
China Software Development laboratory
IBM


> [classlib][luni] InputStreamReader can't handle UnicodeBig encoding
> -------------------------------------------------------------------
>
>                 Key: HARMONY-4196
>                 URL: https://issues.apache.org/jira/browse/HARMONY-4196
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>            Reporter: Vasily Zakharov
>            Assignee: Alexei Zakharov
>            Priority: Minor
>         Attachments: Harmony-4196-InputStreamReader_diagnostics.patch
>
>
> Consider the following simple test:
> import java.io.*;
> public class Test {
>     public static void main(String[] args) {
>         try {
>             new InputStreamReader(new ByteArrayInputStream(new byte[] {(byte) 0xFE, (byte) 0xFF}), "UnicodeBig");
>             System.out.println("SUCCESS");
>         } catch (Throwable e) {
>             System.out.println("FAIL:");
>             e.printStackTrace(System.out);
>         }
>     }
> }
> Output on RI:
> SUCCESS
> Output on Harmony (both DRL VM and IBM VM):
> FAIL:
> java.io.UnsupportedEncodingException
>         at java.io.InputStreamReader.<init>(InputStreamReader.java:104)
>         at Test.main(Test.java:6)
> Additional investigation shows that the cause for this exception is:
> java.nio.charset.UnsupportedCharsetException: The unsupported charset name is "UnicodeBig".
>         at java.nio.charset.Charset.forName(Charset.java:564)
>         at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
>         at Test.main(Test.java:5)
> Interesting point is, the direct call to Charset.forName("UnicodeBig") causes the same exception on RI also.
> So it seems the problem is not in Charset but in InputStreamReader itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.