You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@harmony.apache.org by "Vladimir Strigun (JIRA)" <ji...@apache.org> on 2006/02/27 08:11:27 UTC

[jira] Created: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

CharsetDecoder should replace undefined bytes with replacement string
---------------------------------------------------------------------

         Key: HARMONY-137
         URL: http://issues.apache.org/jira/browse/HARMONY-137
     Project: Harmony
        Type: Bug
  Components: Classlib  
    Reporter: Vladimir Strigun
    Priority: Minor


Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
Testcase for reproducing this issue:

import java.nio.charset.*;
import java.nio.*;

public class Harmony137 {
    public static void main(String[] args) throws Exception {
        ByteBuffer bb = ByteBuffer.allocate(5);
        bb.put((byte)0x81); bb.flip();
        Charset cp1250 = Charset.forName("cp1250");
        CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
        if(cb.get(0)!=65533) {
            System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
        }
    }
}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

Posted by "Vladimir Strigun (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HARMONY-137?page=comments#action_12368301 ] 

Vladimir Strigun commented on HARMONY-137:
------------------------------------------

Tim, I agree with the resolution, please close it.

> CharsetDecoder should replace undefined bytes with replacement string
> ---------------------------------------------------------------------
>
>          Key: HARMONY-137
>          URL: http://issues.apache.org/jira/browse/HARMONY-137
>      Project: Harmony
>         Type: Bug
>   Components: Classlib
>     Reporter: Vladimir Strigun
>     Priority: Minor

>
> Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
> So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
> Testcase for reproducing this issue:
> import java.nio.charset.*;
> import java.nio.*;
> public class Harmony137 {
>     public static void main(String[] args) throws Exception {
>         ByteBuffer bb = ByteBuffer.allocate(5);
>         bb.put((byte)0x81); bb.flip();
>         Charset cp1250 = Charset.forName("cp1250");
>         CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
>         if(cb.get(0)!=65533) {
>             System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
>         }
>     }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Closed: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-137?page=all ]
     
Tim Ellison closed HARMONY-137:
-------------------------------


Verified by Vladimir.

> CharsetDecoder should replace undefined bytes with replacement string
> ---------------------------------------------------------------------
>
>          Key: HARMONY-137
>          URL: http://issues.apache.org/jira/browse/HARMONY-137
>      Project: Harmony
>         Type: Bug
>   Components: Classlib
>     Reporter: Vladimir Strigun
>     Priority: Minor

>
> Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
> So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
> Testcase for reproducing this issue:
> import java.nio.charset.*;
> import java.nio.*;
> public class Harmony137 {
>     public static void main(String[] args) throws Exception {
>         ByteBuffer bb = ByteBuffer.allocate(5);
>         bb.put((byte)0x81); bb.flip();
>         Charset cp1250 = Charset.forName("cp1250");
>         CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
>         if(cb.get(0)!=65533) {
>             System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
>         }
>     }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Resolved: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
     [ http://issues.apache.org/jira/browse/HARMONY-137?page=all ]
     
Tim Ellison resolved HARMONY-137:
---------------------------------

    Resolution: Won't Fix

For the reasons Richard and the ICU team give, this is being marked as won't fix.


> CharsetDecoder should replace undefined bytes with replacement string
> ---------------------------------------------------------------------
>
>          Key: HARMONY-137
>          URL: http://issues.apache.org/jira/browse/HARMONY-137
>      Project: Harmony
>         Type: Bug
>   Components: Classlib
>     Reporter: Vladimir Strigun
>     Priority: Minor

>
> Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
> So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
> Testcase for reproducing this issue:
> import java.nio.charset.*;
> import java.nio.*;
> public class Harmony137 {
>     public static void main(String[] args) throws Exception {
>         ByteBuffer bb = ByteBuffer.allocate(5);
>         bb.put((byte)0x81); bb.flip();
>         Charset cp1250 = Charset.forName("cp1250");
>         CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
>         if(cb.get(0)!=65533) {
>             System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
>         }
>     }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


[jira] Commented: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

Posted by "Richard Liang (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HARMONY-137?page=comments#action_12368087 ] 

Richard Liang commented on HARMONY-137:
---------------------------------------

Please see the bug info in ICU bug system: http://bugs.icu-project.org/cgi-bin/icu-bugs?findid=5085&go=Go

And attached here is ICU team's response to this bug:

You are expecting incorrect behavior from cp1250. Both Microsoft's conversion APIs and IBM mapping tables convert byte 81 to Unicode character 0081. This conversion behavior will not change. The tables on unicode.org may tell you about the official mappings, but there are other mappings that are commonly expected.

More details about ICU charset conversion can be found on this page: http://icu.sourceforge.net/charts/charset/

This charset conversion works as expected.



> CharsetDecoder should replace undefined bytes with replacement string
> ---------------------------------------------------------------------
>
>          Key: HARMONY-137
>          URL: http://issues.apache.org/jira/browse/HARMONY-137
>      Project: Harmony
>         Type: Bug
>   Components: Classlib
>     Reporter: Vladimir Strigun
>     Priority: Minor

>
> Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
> So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
> Testcase for reproducing this issue:
> import java.nio.charset.*;
> import java.nio.*;
> public class Harmony137 {
>     public static void main(String[] args) throws Exception {
>         ByteBuffer bb = ByteBuffer.allocate(5);
>         bb.put((byte)0x81); bb.flip();
>         Charset cp1250 = Charset.forName("cp1250");
>         CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
>         if(cb.get(0)!=65533) {
>             System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
>         }
>     }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


Re: [jira] Commented: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

Posted by Tim Ellison <t....@gmail.com>.
Paulex,

Please add the ICU bug number to this JIRA when you have it -- thanks.

Regards,
Tim

Paulex Yang (JIRA) wrote:
>     [ http://issues.apache.org/jira/browse/HARMONY-137?page=comments#action_12367917 ] 
> 
> Paulex Yang commented on HARMONY-137:
> -------------------------------------
> 
> A little investigation on the Harmony codes, seems it is caused by problems of ICU4JNI decoder provider, the following test cases shows that.  RI "cp1250" passes the testcase while ICU "cp1250" fails under either RI or Harmony . I'll try to report the bug to ICU.
> 
> Test case:
> 
> 	public void testDecode_JIRA137() {
> 		ByteBuffer bb = ByteBuffer.allocate(5);
> 		bb.put((byte) 0x81);
> 		bb.flip();
> 		// Use ICU cp1250 charset
> 		CharsetProviderICU provider = new CharsetProviderICU();
> 		Charset cp1250 = provider.charsetForName("cp1250");
> 		// Uncomment code below to use RI charset
> 		//cp1250 = Charset.forName("cp1250");
> 		CharBuffer cb;
> 		try {
> 			cb = cp1250.newDecoder()
> 					.onMalformedInput(CodingErrorAction.REPLACE)
> 					.onUnmappableCharacter(CodingErrorAction.REPLACE)
> 					.decode(bb);
> 			assertEquals(0XFFFD,cb.get(0));
> 		} catch (CharacterCodingException e) {
> 			e.printStackTrace();
> 		}
> 	}
> 
>> CharsetDecoder should replace undefined bytes with replacement string
>> ---------------------------------------------------------------------
>>
>>          Key: HARMONY-137
>>          URL: http://issues.apache.org/jira/browse/HARMONY-137
>>      Project: Harmony
>>         Type: Bug
>>   Components: Classlib
>>     Reporter: Vladimir Strigun
>>     Priority: Minor
> 
>> Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
>> So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
>> Testcase for reproducing this issue:
>> import java.nio.charset.*;
>> import java.nio.*;
>> public class Harmony137 {
>>     public static void main(String[] args) throws Exception {
>>         ByteBuffer bb = ByteBuffer.allocate(5);
>>         bb.put((byte)0x81); bb.flip();
>>         Charset cp1250 = Charset.forName("cp1250");
>>         CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
>>         if(cb.get(0)!=65533) {
>>             System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
>>         }
>>     }
>> }
> 

-- 

Tim Ellison (t.p.ellison@gmail.com)
IBM Java technology centre, UK.

[jira] Commented: (HARMONY-137) CharsetDecoder should replace undefined bytes with replacement string

Posted by "Paulex Yang (JIRA)" <ji...@apache.org>.
    [ http://issues.apache.org/jira/browse/HARMONY-137?page=comments#action_12367917 ] 

Paulex Yang commented on HARMONY-137:
-------------------------------------

A little investigation on the Harmony codes, seems it is caused by problems of ICU4JNI decoder provider, the following test cases shows that.  RI "cp1250" passes the testcase while ICU "cp1250" fails under either RI or Harmony . I'll try to report the bug to ICU.

Test case:

	public void testDecode_JIRA137() {
		ByteBuffer bb = ByteBuffer.allocate(5);
		bb.put((byte) 0x81);
		bb.flip();
		// Use ICU cp1250 charset
		CharsetProviderICU provider = new CharsetProviderICU();
		Charset cp1250 = provider.charsetForName("cp1250");
		// Uncomment code below to use RI charset
		//cp1250 = Charset.forName("cp1250");
		CharBuffer cb;
		try {
			cb = cp1250.newDecoder()
					.onMalformedInput(CodingErrorAction.REPLACE)
					.onUnmappableCharacter(CodingErrorAction.REPLACE)
					.decode(bb);
			assertEquals(0XFFFD,cb.get(0));
		} catch (CharacterCodingException e) {
			e.printStackTrace();
		}
	}

> CharsetDecoder should replace undefined bytes with replacement string
> ---------------------------------------------------------------------
>
>          Key: HARMONY-137
>          URL: http://issues.apache.org/jira/browse/HARMONY-137
>      Project: Harmony
>         Type: Bug
>   Components: Classlib
>     Reporter: Vladimir Strigun
>     Priority: Minor

>
> Corresponding to cp1250 mapping table, 0x81 byte is undefined. See http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1250.TXT
> So, charset decoder should replace undefined bytes with default replacement, i.e. 0xFFFD. 
> Testcase for reproducing this issue:
> import java.nio.charset.*;
> import java.nio.*;
> public class Harmony137 {
>     public static void main(String[] args) throws Exception {
>         ByteBuffer bb = ByteBuffer.allocate(5);
>         bb.put((byte)0x81); bb.flip();
>         Charset cp1250 = Charset.forName("cp1250");
>         CharBuffer cb = cp1250.newDecoder().onMalformedInput(CodingErrorAction.REPLACE).onUnmappableCharacter(CodingErrorAction.REPLACE).decode(bb);
>         if(cb.get(0)!=65533) {
>             System.out.println("FAIL: expected 0xFFFD but result is: 0x"+Integer.toHexString(cb.get(0)).toUpperCase());
>         }
>     }
> }

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira