You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@harmony.apache.org by "Robert Muir (JIRA)" <ji...@apache.org> on 2010/09/02 15:27:53 UTC
[jira] Created: (HARMONY-6640) UTF8 decoder doesn't properly decode
supplementary characters
UTF8 decoder doesn't properly decode supplementary characters
-------------------------------------------------------------
Key: HARMONY-6640
URL: https://issues.apache.org/jira/browse/HARMONY-6640
Project: Harmony
Issue Type: Bug
Components: Classlib
Affects Versions: 5.0M14
Environment: Windows Vista
Reporter: Robert Muir
When attempting to build Lucene, I discovered a problem with UTF8 decoding.
(this actually prevents our tests from even compiling without a workaround)
For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode
supplementary characters
Posted by "Mark Hindess (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mark Hindess updated HARMONY-6640:
----------------------------------
Attachment: nio_char.jar
Here is an update nio_char.jar
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909440#action_12909440 ]
Robert Muir commented on HARMONY-6640:
--------------------------------------
Great, the M15 build fixed problems with our mmap-ed IO that I was seeing in M14.
Unfortunately, there are no Lucene test results with Harmony available at the moment.
Perhaps a Hudson job for this would be useful.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode
supplementary characters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated HARMONY-6640:
---------------------------------
Attachment: HARMONY-6640.patch
Attach is an improved version of the patch i sent to the mailing list.
Below was the original simple test case i supplied:
public void testUTF8() throws Exception {
// U+1D11E: MUSICAL SYMBOL G CLEF
String s = new StringBuilder().appendCodePoint(0x1D11E).toString();
byte utf8[] = s.getBytes("UTF-8");
assertEquals(s, new String(utf8, 0, utf8.length, "UTF-8"));
}
I also ran round-trip tests with randomly generated strings... but I'm not
setup to build harmony on my machine, so I apologize for lack of a test
case in the actual patch.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Attachments: HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Mark Hindess (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909430#action_12909430 ]
Mark Hindess commented on HARMONY-6640:
---------------------------------------
Excellent. 5.0M15 should be on the mirrors now in the obvious location. It'll probably be announced tomorrow.
Are the Lucene test results with Harmony accessible on the web?
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909817#action_12909817 ]
Hudson commented on HARMONY-6640:
---------------------------------
Integrated in Harmony-1.5-head-linux-x86_64 #944 (See [https://hudson.apache.org/hudson/job/Harmony-1.5-head-linux-x86_64/944/])
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909269#action_12909269 ]
Robert Muir commented on HARMONY-6640:
--------------------------------------
bq. Can you retest Lucene and see how it goes now? I can send you an updated "nio_char.jar" if required.
Absolutely! if you can send me the jar, that would be extremely helpful, because I'm not setup to build
harmony on my platform (windows)...
This fix would get us much further in the testing process, and allow us to fix test failures on harmony that
are really our fault ("sunisms", etc).
By the way, I created LUCENE-2630 to track some of this.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909867#action_12909867 ]
Hudson commented on HARMONY-6640:
---------------------------------
Integrated in Harmony-select-1.5-head-linux-x86_64 #106 (See [https://hudson.apache.org/hudson/job/Harmony-select-1.5-head-linux-x86_64/106/])
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909265#action_12909265 ]
Tim Ellison commented on HARMONY-6640:
--------------------------------------
Can you retest Lucene and see how it goes now? I can send you an updated "nio_char.jar" if required.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Ellison resolved HARMONY-6640.
----------------------------------
Fix Version/s: 5.0M16
Resolution: Fixed
Thanks Robert.
Patch applied to nio_char module at repo revision r996904.
Please check it was applied as you expected.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Ellison reassigned HARMONY-6640:
------------------------------------
Assignee: Tim Ellison
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Mark Hindess (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909448#action_12909448 ]
Mark Hindess commented on HARMONY-6640:
---------------------------------------
I was thinking the same thing particularly when I was attaching the luni.jar to the other JIRA. (Unfortunately the hudson slave that builds harmony is offline otherwise I'd have suggested you use jar from there as luni.jar happens to be platform independent.)
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HARMONY-6640) UTF8 decoder doesn't properly decode
supplementary characters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Robert Muir updated HARMONY-6640:
---------------------------------
Attachment: HARMONY-6640.patch
Sorry: i forgot to do the non hasArray() case also, which has the same bug.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909406#action_12909406 ]
Robert Muir commented on HARMONY-6640:
--------------------------------------
Thanks Mark!
By the way: its looking much better since i switched from M14 to http://people.apache.org/~hindessm/milestones/5.0M15/
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch, nio_char.jar
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Closed: (HARMONY-6640) UTF8 decoder doesn't properly decode
supplementary characters
Posted by "Tim Ellison (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tim Ellison closed HARMONY-6640.
--------------------------------
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HARMONY-6640) UTF8 decoder doesn't properly
decode supplementary characters
Posted by "Robert Muir (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HARMONY-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12909261#action_12909261 ]
Robert Muir commented on HARMONY-6640:
--------------------------------------
Thank you! The commit looks good to me.
> UTF8 decoder doesn't properly decode supplementary characters
> -------------------------------------------------------------
>
> Key: HARMONY-6640
> URL: https://issues.apache.org/jira/browse/HARMONY-6640
> Project: Harmony
> Issue Type: Bug
> Components: Classlib
> Affects Versions: 5.0M14
> Environment: Windows Vista
> Reporter: Robert Muir
> Assignee: Tim Ellison
> Fix For: 5.0M16
>
> Attachments: HARMONY-6640.patch, HARMONY-6640.patch
>
>
> When attempting to build Lucene, I discovered a problem with UTF8 decoding.
> (this actually prevents our tests from even compiling without a workaround)
> For any codepoint > 0xffff (4-byte utf-8 sequence), the decoder doesn't properly
> split the decoded codepoint into surrogate pairs.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.