You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Li Jing Qin (JIRA)" <ji...@apache.org> on 2009/03/05 07:41:59 UTC

[jira] Commented: (HARMONY-4196) [classlib][luni] InputStreamReader can't handle UnicodeBig encoding

    [ https://issues.apache.org/jira/browse/HARMONY-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679092#action_12679092 ] 

Li Jing Qin commented on HARMONY-4196:
--------------------------------------

Hey guys, I am doing EUT test for 3.5. This also block the testcase. So I decide to fix it.
I am agree with Paulex to map the UnicodeBig and UnicodeLittle to the UTF-16. Here is the similiar tests:
public final static byte[] BOM_UTF_16BE = {(byte) 0xFE, (byte) 0xFF};
	
	public static void printByteArray(byte[] array) {
		System.out.println("LEN: " + array.length);
		for (byte b : array) {
			System.out.print(Character.forDigit(((b & 0xF0) >> 4), 16));
			System.out.print(Character.forDigit((b & 0x0F), 16));
			System.out.print(" ");
		}
		System.out.println();
	}
	
	public static InputStream getInputStream(byte[][] contents) {
		int size = 0;
		// computes final array size 
		for (int i = 0; i < contents.length; i++)
			size += contents[i].length;
		byte[] full = new byte[size];
		int fullIndex = 0;
		// concatenates all byte arrays
		for (int i = 0; i < contents.length; i++)
			for (int j = 0; j < contents[i].length; j++)
				full[fullIndex++] = contents[i][j];
		return new ByteArrayInputStream(full);
	}
	
	public static void main(String[] args) throws Exception {
		String XML_ROOT_ELEMENT_NO_DECL = "<org.eclipse.core.runtime.tests.root-element/>";
		try {
			byte[] bArray = XML_ROOT_ELEMENT_NO_DECL.getBytes("UTF-16BE");
			printByteArray(bArray);
		} catch (Exception e) {
			e.printStackTrace();
		}
		
		InputStreamReader reader = new InputStreamReader(getInputStream(new byte[][] {BOM_UTF_16BE, XML_ROOT_ELEMENT_NO_DECL.getBytes("UTF-16BE")}), "UnicodeBig");
		StringBuilder sb = new StringBuilder();
		int c = -1;
		while ((c = reader.read()) != -1) {
			sb.append((char)c);
		}
		System.out.println("GET:" + sb);
	}

if we change the "UnicodeBig" to the "UTF-16", our harmony could correctly parse the stream.

There are two ways to fix this:
1. Add the mapping in the InputStreamReader and OutputStreamReader
2. Add the mapping in the Charset.forName(), which will let the Charset support UnicodeBig and UnicodeLittle.

I would like to choose fix 2. Any consideration is appreciate.
Patch will be attached later.


> [classlib][luni] InputStreamReader can't handle UnicodeBig encoding
> -------------------------------------------------------------------
>
>                 Key: HARMONY-4196
>                 URL: https://issues.apache.org/jira/browse/HARMONY-4196
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>            Reporter: Vasily Zakharov
>            Assignee: Alexei Zakharov
>            Priority: Minor
>         Attachments: Harmony-4196-InputStreamReader_diagnostics.patch
>
>
> Consider the following simple test:
> import java.io.*;
> public class Test {
>     public static void main(String[] args) {
>         try {
>             new InputStreamReader(new ByteArrayInputStream(new byte[] {(byte) 0xFE, (byte) 0xFF}), "UnicodeBig");
>             System.out.println("SUCCESS");
>         } catch (Throwable e) {
>             System.out.println("FAIL:");
>             e.printStackTrace(System.out);
>         }
>     }
> }
> Output on RI:
> SUCCESS
> Output on Harmony (both DRL VM and IBM VM):
> FAIL:
> java.io.UnsupportedEncodingException
>         at java.io.InputStreamReader.<init>(InputStreamReader.java:104)
>         at Test.main(Test.java:6)
> Additional investigation shows that the cause for this exception is:
> java.nio.charset.UnsupportedCharsetException: The unsupported charset name is "UnicodeBig".
>         at java.nio.charset.Charset.forName(Charset.java:564)
>         at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
>         at Test.main(Test.java:5)
> Interesting point is, the direct call to Charset.forName("UnicodeBig") causes the same exception on RI also.
> So it seems the problem is not in Charset but in InputStreamReader itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Re: [jira] Commented: (HARMONY-4196) [classlib][luni] InputStreamReader can't handle UnicodeBig encoding

Posted by Charles Lee <li...@gmail.com>.
Here is the patch look like:

diff --git
modules/luni/src/test/api/common/org/apache/harmony/luni/tests/java/io/InputStreamReaderTest.java
modules/luni/src/test/api/common/org/apache/harmony/luni/tests/java/io/InputStreamReaderTest.java
index 5edc277..88a8da7 100644
---
modules/luni/src/test/api/common/org/apache/harmony/luni/tests/java/io/InputStreamReaderTest.java
+++
modules/luni/src/test/api/common/org/apache/harmony/luni/tests/java/io/InputStreamReaderTest.java
@@ -55,6 +55,12 @@ public class InputStreamReaderTest extends TestCase {
                 bytes = new byte[] { '\u001b', '$', 'B', '6', 'e', 'B',
'h',
                         '\u001b', '(', 'B' };
                 break;
+            case 3:
+                bytes = new byte[] { (byte) 0xff, (byte) 0xfe };
+                break;
+            case 4:
+                bytes = new byte[] { (byte) 0xfe, (byte) 0xff };
+                break;
             }
             count = bytes.length;
         }
@@ -97,6 +103,8 @@ public class InputStreamReaderTest extends TestCase {

     private InputStreamReader reader;

+    private InputStreamReader inUTF16;
+
     private final String source = "This is a test message with Unicode
character. \u4e2d\u56fd is China's name in Chinese";

     /*
@@ -246,6 +254,20 @@ public class InputStreamReaderTest extends TestCase {
         assertEquals(Charset.forName(reader2.getEncoding()), Charset
                 .forName("utf-8"));
         reader2.close();
+        try {
+            InputStream streamIn16 = new LimitedByteArrayInputStream(3);
+            inUTF16 = new InputStreamReader(streamIn16, "UnicodeLittle");
+            inUTF16.close();
+        } catch (UnsupportedEncodingException e) {
+            fail ("Should Support UnicodeLittle");
+        }
+        try {
+            InputStream streamIn16 = new LimitedByteArrayInputStream(4);
+            inUTF16 = new InputStreamReader(streamIn16, "UnicodeBig");
+            inUTF16.close();
+        } catch (UnsupportedEncodingException e) {
+            fail ("Should Support UnicodeBig");
+        }
     }

     /**
diff --git modules/nio_char/src/main/java/java/nio/charset/Charset.java
modules/nio_char/src/main/java/java/nio/charset/Charset.java
index 7b8d79d..65a2593 100644
--- modules/nio_char/src/main/java/java/nio/charset/Charset.java
+++ modules/nio_char/src/main/java/java/nio/charset/Charset.java
@@ -508,6 +508,9 @@ public abstract class Charset implements
Comparable<Charset> {
      *             If the desired charset is not supported by this runtime.
      */
     public static Charset forName(String charsetName) {
+        if ("UnicodeBig".equalsIgnoreCase(charsetName) ||
"UnicodeLittle".equalsIgnoreCase(charsetName)) {
+            charsetName = "UTF-16";
+        }
         Charset c = forNameInternal(charsetName);
         if (null == c) {
             throw new UnsupportedCharsetException(charsetName);



On Thu, Mar 5, 2009 at 2:41 PM, Li Jing Qin (JIRA) <ji...@apache.org> wrote:

>
>    [
> https://issues.apache.org/jira/browse/HARMONY-4196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679092#action_12679092]
>
> Li Jing Qin commented on HARMONY-4196:
> --------------------------------------
>
> Hey guys, I am doing EUT test for 3.5. This also block the testcase. So I
> decide to fix it.
> I am agree with Paulex to map the UnicodeBig and UnicodeLittle to the
> UTF-16. Here is the similiar tests:
> public final static byte[] BOM_UTF_16BE = {(byte) 0xFE, (byte) 0xFF};
>
>        public static void printByteArray(byte[] array) {
>                System.out.println("LEN: " + array.length);
>                for (byte b : array) {
>                        System.out.print(Character.forDigit(((b & 0xF0) >>
> 4), 16));
>                        System.out.print(Character.forDigit((b & 0x0F),
> 16));
>                        System.out.print(" ");
>                }
>                System.out.println();
>        }
>
>        public static InputStream getInputStream(byte[][] contents) {
>                int size = 0;
>                // computes final array size
>                for (int i = 0; i < contents.length; i++)
>                        size += contents[i].length;
>                byte[] full = new byte[size];
>                int fullIndex = 0;
>                // concatenates all byte arrays
>                for (int i = 0; i < contents.length; i++)
>                        for (int j = 0; j < contents[i].length; j++)
>                                full[fullIndex++] = contents[i][j];
>                return new ByteArrayInputStream(full);
>        }
>
>        public static void main(String[] args) throws Exception {
>                String XML_ROOT_ELEMENT_NO_DECL =
> "<org.eclipse.core.runtime.tests.root-element/>";
>                try {
>                        byte[] bArray =
> XML_ROOT_ELEMENT_NO_DECL.getBytes("UTF-16BE");
>                        printByteArray(bArray);
>                } catch (Exception e) {
>                        e.printStackTrace();
>                }
>
>                InputStreamReader reader = new
> InputStreamReader(getInputStream(new byte[][] {BOM_UTF_16BE,
> XML_ROOT_ELEMENT_NO_DECL.getBytes("UTF-16BE")}), "UnicodeBig");
>                StringBuilder sb = new StringBuilder();
>                int c = -1;
>                while ((c = reader.read()) != -1) {
>                        sb.append((char)c);
>                }
>                System.out.println("GET:" + sb);
>        }
>
> if we change the "UnicodeBig" to the "UTF-16", our harmony could correctly
> parse the stream.
>
> There are two ways to fix this:
> 1. Add the mapping in the InputStreamReader and OutputStreamReader
> 2. Add the mapping in the Charset.forName(), which will let the Charset
> support UnicodeBig and UnicodeLittle.
>
> I would like to choose fix 2. Any consideration is appreciate.
> Patch will be attached later.
>
>
> > [classlib][luni] InputStreamReader can't handle UnicodeBig encoding
> > -------------------------------------------------------------------
> >
> >                 Key: HARMONY-4196
> >                 URL: https://issues.apache.org/jira/browse/HARMONY-4196
> >             Project: Harmony
> >          Issue Type: Bug
> >          Components: Classlib
> >            Reporter: Vasily Zakharov
> >            Assignee: Alexei Zakharov
> >            Priority: Minor
> >         Attachments: Harmony-4196-InputStreamReader_diagnostics.patch
> >
> >
> > Consider the following simple test:
> > import java.io.*;
> > public class Test {
> >     public static void main(String[] args) {
> >         try {
> >             new InputStreamReader(new ByteArrayInputStream(new byte[]
> {(byte) 0xFE, (byte) 0xFF}), "UnicodeBig");
> >             System.out.println("SUCCESS");
> >         } catch (Throwable e) {
> >             System.out.println("FAIL:");
> >             e.printStackTrace(System.out);
> >         }
> >     }
> > }
> > Output on RI:
> > SUCCESS
> > Output on Harmony (both DRL VM and IBM VM):
> > FAIL:
> > java.io.UnsupportedEncodingException
> >         at java.io.InputStreamReader.<init>(InputStreamReader.java:104)
> >         at Test.main(Test.java:6)
> > Additional investigation shows that the cause for this exception is:
> > java.nio.charset.UnsupportedCharsetException: The unsupported charset
> name is "UnicodeBig".
> >         at java.nio.charset.Charset.forName(Charset.java:564)
> >         at java.io.InputStreamReader.<init>(InputStreamReader.java:99)
> >         at Test.main(Test.java:5)
> > Interesting point is, the direct call to Charset.forName("UnicodeBig")
> causes the same exception on RI also.
> > So it seems the problem is not in Charset but in InputStreamReader
> itself.
>
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.
>
>


-- 
Yours sincerely,
Charles Lee