You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@harmony.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/09/02 15:27:53 UTC

[jira] Created: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

UTF8 decoder doesn't properly decode supplementary characters
-------------------------------------------------------------

                 Key: HARMONY-6640
                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
             Project: Harmony
          Issue Type: Bug
          Components: Classlib
    Affects Versions: 5.0M14
         Environment: Windows Vista
            Reporter: Robert Muir


When attempting to build Lucene, I discovered a problem with UTF8 decoding.
(this actually prevents our tests from even compiling without a workaround)

For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
split the decoded codepoint into surrogate pairs.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Mark Hindess (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mark Hindess updated HARMONY-6640:
----------------------------------

    Attachment: nio_char.jar

Here is an update nio_char.jar

> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909440#action_12909440 ] 

Robert Muir commented on HARMONY-6640:
--------------------------------------

Great, the M15 build fixed problems with our mmap-ed IO that I was seeing in M14.

Unfortunately, there are no Lucene test results with Harmony available at the moment.

Perhaps a Hudson job for this would be useful.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated HARMONY-6640:
---------------------------------

    Attachment: HARMONY-6640.patch

Attach is an improved version of the patch i sent to the mailing list.

Below was the original simple test case i supplied:

 public void testUTF8() throws Exception {
    // U+1D11E: MUSICAL SYMBOL G CLEF
    String s = new StringBuilder().appendCodePoint(0x1D11E).toString();
    byte utf8[] = s.getBytes("UTF-8");
    assertEquals(s, new String(utf8, 0, utf8.length, "UTF-8"));
  }

I also ran round-trip tests with randomly generated strings... but I'm not
setup to build harmony on my machine, so I apologize for lack of a test
case in the actual patch.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>         Attachments: HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Mark Hindess (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909430#action_12909430 ] 

Mark Hindess commented on HARMONY-6640:
---------------------------------------

Excellent.  5.0M15 should be on the mirrors now in the obvious location.  It'll probably be announced tomorrow.

Are the Lucene test results with Harmony accessible on the web?




> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909817#action_12909817 ] 

Hudson commented on HARMONY-6640:
---------------------------------

Integrated in Harmony-1.5-head-linux-x86_64 #944 (See [https://hudson.apache.org/hudson/job/Harmony-1.5-head-linux-x86_64/944/])
    

> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909269#action_12909269 ] 

Robert Muir commented on HARMONY-6640:
--------------------------------------

bq. Can you retest Lucene and see how it goes now? I can send you an updated "nio_char.jar" if required.

Absolutely! if you can send me the jar, that would be extremely helpful, because I'm not setup to build 
harmony on my platform (windows)...

This fix would get us much further in the testing process, and allow us to fix test failures on harmony that 
are really our fault ("sunisms", etc).

By the way, I created LUCENE-2630 to track some of this.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909867#action_12909867 ] 

Hudson commented on HARMONY-6640:
---------------------------------

Integrated in Harmony-select-1.5-head-linux-x86_64 #106 (See [https://hudson.apache.org/hudson/job/Harmony-select-1.5-head-linux-x86_64/106/])
    

> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909265#action_12909265 ] 

Tim Ellison commented on HARMONY-6640:
--------------------------------------

Can you retest Lucene and see how it goes now?  I can send you an updated "nio_char.jar" if required.

> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Ellison resolved HARMONY-6640.
----------------------------------

    Fix Version/s: 5.0M16
       Resolution: Fixed

Thanks Robert.

Patch applied to nio_char module at repo revision r996904.

Please check it was applied as you expected.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Ellison reassigned HARMONY-6640:
------------------------------------

    Assignee: Tim Ellison

> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Mark Hindess (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909448#action_12909448 ] 

Mark Hindess commented on HARMONY-6640:
---------------------------------------

I was thinking the same thing particularly when I was attaching the luni.jar  to the other JIRA.  (Unfortunately the hudson slave that builds harmony is offline otherwise I'd have suggested you use jar from there as luni.jar happens to be platform independent.)


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated HARMONY-6640:
---------------------------------

    Attachment: HARMONY-6640.patch

Sorry: i forgot to do the non hasArray() case also, which has the same bug.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909406#action_12909406 ] 

Robert Muir commented on HARMONY-6640:
--------------------------------------

Thanks Mark!

By the way: its looking much better since i switched from M14 to http://people.apache.org/~hindessm/milestones/5.0M15/


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Ellison closed HARMONY-6640.
--------------------------------


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly decode supplementary characters

Posted by "Robert Muir (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909261#action_12909261 ] 

Robert Muir commented on HARMONY-6640:
--------------------------------------

Thank you! The commit looks good to me.


> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
>                 Key: HARMONY-6640
>                 URL: https://issues.apache.org/jira/browse/HARMONY-6640
>             Project: Harmony
>          Issue Type: Bug
>          Components: Classlib
>    Affects Versions: 5.0M14
>         Environment: Windows Vista
>            Reporter: Robert Muir
>            Assignee: Tim Ellison
>             Fix For: 5.0M16
>
>         Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.