You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@commons.apache.org by "Michael Konietzka (JIRA)" <ji...@apache.org> on 2010/11/13 15:32:14 UTC

[jira] Created: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
--------------------------------------------------------------------------------------

                 Key: LANG-658
                 URL: https://issues.apache.org/jira/browse/LANG-658
             Project: Commons Lang
          Issue Type: Bug
          Components: lang.text.translate.*
    Affects Versions: 3.0
            Reporter: Michael Konietzka
             Fix For: 3.0


In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign

but this must be   

       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931693#action_12931693 ] 

Sebb edited comment on LANG-658 at 11/13/10 11:19 AM:
------------------------------------------------------

Another duplicate entry:

{noformat} 
        {"\u00F1", "&ntilde;"}, // ñ - lowercase n, tilde
        {"\u00F3", "&ograve;"}, // ò - lowercase o, grave accent
        {"\u00F3", "&oacute;"}, // ó - lowercase o, acute accent
{noformat} 

first F3 entry should be F2

      was (Author: sebb@apache.org):
    Another duplicate entry:

        {"\u00F1", "&ntilde;"}, // ñ - lowercase n, tilde
        {"\u00F3", "&ograve;"}, // ò - lowercase o, grave accent
        {"\u00F3", "&oacute;"}, // ó - lowercase o, acute accent

first F3 entry should be F2
  
> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> but this must be   
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb updated LANG-658:
----------------------

    Description: 
In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
{noformat} 
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign
{noformat} 

but this must be   

{noformat} 
       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign
{noformat} 

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

First look:

u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

  was:
In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign

but this must be   

       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

First look:

u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915


> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
> {noformat} 
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> {noformat} 
> but this must be   
> {noformat} 
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> {noformat} 
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebb resolved LANG-658.
-----------------------

       Resolution: Fixed
    Fix Version/s: 3.0

Now hopefully fixed.

> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>             Fix For: 3.0
>
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
> {noformat} 
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> {noformat} 
> but this must be   
> {noformat} 
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> {noformat} 
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931709#action_12931709 ] 

Sebb commented on LANG-658:
---------------------------

Note: ran a check comparing the values agains the ones from lang2 Entities, and the two implementations now seem to agree

> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>             Fix For: 3.0
>
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
> {noformat} 
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> {noformat} 
> but this must be   
> {noformat} 
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> {noformat} 
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Michael Konietzka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Konietzka updated LANG-658:
-----------------------------------

    Fix Version/s:     (was: 3.0)
      Description: 
In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign

but this must be   

       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

First look:

u00CA is missing in the array and all following entries are matched wrong by an offset of 1.

  was:
In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign

but this must be   

       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm


> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> but this must be   
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Closed: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Michael Konietzka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Konietzka closed LANG-658.
----------------------------------


> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>             Fix For: 3.0
>
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
> {noformat} 
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> {noformat} 
> but this must be   
> {noformat} 
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> {noformat} 
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Michael Konietzka (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Konietzka updated LANG-658:
-----------------------------------

    Description: 
In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign

but this must be   

       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

First look:

u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

  was:
In EntityArrays 

In
 private static final String[][] ISO8859_1_ESCAPE 
some matching is wrong, for example
       
        {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D8", "&times;"}, // multiplication sign

but this must be   

       {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
        {"\u00D7", "&times;"}, // multiplication sign

according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm

First look:

u00CA is missing in the array and all following entries are matched wrong by an offset of 1.


> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> but this must be   
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931691#action_12931691 ] 

Sebb commented on LANG-658:
---------------------------

Later on, there are two instances of E5:

        {"\u00E5", "&auml;"}, // ä - lowercase a, umlaut
        {"\u00E5", "&aring;"}, // å - lowercase a, ring

The latter is correct, and subsequent entries seem OK.

> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> but this must be   
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931693#action_12931693 ] 

Sebb commented on LANG-658:
---------------------------

Another duplicate entry:

        {"\u00F1", "&ntilde;"}, // ñ - lowercase n, tilde
        {"\u00F3", "&ograve;"}, // ò - lowercase o, grave accent
        {"\u00F3", "&oacute;"}, // ó - lowercase o, acute accent

first F3 entry should be F2

> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> but this must be   
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Issue Comment Edited: (LANG-658) Some Entitys like Ö are not matched properly against its ISO8859-1 representation

Posted by "Sebb (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/LANG-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12931691#action_12931691 ] 

Sebb edited comment on LANG-658 at 11/13/10 11:19 AM:
------------------------------------------------------

Later on, there are two instances of E5:

{noformat} 
        {"\u00E5", "&auml;"}, // ä - lowercase a, umlaut
        {"\u00E5", "&aring;"}, // å - lowercase a, ring
{noformat} 

The latter is correct, and subsequent entries seem OK.

      was (Author: sebb@apache.org):
    Later on, there are two instances of E5:

        {"\u00E5", "&auml;"}, // ä - lowercase a, umlaut
        {"\u00E5", "&aring;"}, // å - lowercase a, ring

The latter is correct, and subsequent entries seem OK.
  
> Some Entitys like &Ouml; are not matched properly against its ISO8859-1 representation
> --------------------------------------------------------------------------------------
>
>                 Key: LANG-658
>                 URL: https://issues.apache.org/jira/browse/LANG-658
>             Project: Commons Lang
>          Issue Type: Bug
>          Components: lang.text.translate.*
>    Affects Versions: 3.0
>            Reporter: Michael Konietzka
>
> In EntityArrays 
> In
>  private static final String[][] ISO8859_1_ESCAPE 
> some matching is wrong, for example
>        
>         {"\u00D7", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D8", "&times;"}, // multiplication sign
> but this must be   
>        {"\u00D6", "&Ouml;"}, // Ö - uppercase O, umlaut
>         {"\u00D7", "&times;"}, // multiplication sign
> according to http://www.fileformat.info/info/unicode/block/latin_supplement/list.htm
> First look:
> u00CA is missing in the array and all following entries are matched wrong by an offset of 1.
> Found on http://stackoverflow.com/questions/4172784/bug-in-apache-commons-stringescapeutil/4172915#4172915

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.