You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2007/05/18 14:29:16 UTC

[jira] Created: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

%a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
-----------------------------------------------------------------------------------------------

                 Key: DERBY-2670
                 URL: https://issues.apache.org/jira/browse/DERBY-2670
             Project: Derby
          Issue Type: Bug
          Components: SQL
    Affects Versions: 10.3.0.0
            Reporter: Knut Anders Hatlen


On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:

ij> select * from t where x like '%a%';
X                   
--------------------
Waagan              
Wåhan               
Wanvik              
Wågan               
ekstrabetaling      
ekstra­arbeid       
­a                  
a                   
-a                  
 a                  
B                   
C                   

12 rows selected

The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.

Similar wrong results are seen with '%aa%':

ij> select * from t where x like '%aa%';
X                   
--------------------
Waagan              
ekstraarbeid        
B                   

3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12499405 ] 

Knut Anders Hatlen commented on DERBY-2670:
-------------------------------------------

Thanks Mike, your check-in solved the error in the repro. However, I still see some strange results. Using the same table as in the repro, "a%" matches "\u00ADa", "a", "-a", " a" and "A", which doesn't seem correct to me. Similar incorrect results with "b%" and "c%". Is this the same issue, or should I open a new one?

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>             Fix For: 10.3.0.0
>
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-2670:
----------------------------------


This problem is very sensitive to the data that exists in the table.  Removing a row that should not be affected by the query
changes the results.  

I don't know if this is the problem, but it looks like the WorkHorseForCollatorDatatypes class may cache values in it..  Derby
may reuse a single DataValueDescriptor for multiple values.  This is why code like readExternal goes out of it's way to 
reset fields in the datastructure when reading in a new value.  

I think  this has to somehow get accomplished for the various uses of WorkHorseForCollatorDatatypes in:
CollatorSQLChar.java         CollatorSQLLongvarchar.java
CollatorSQLClob.java         CollatorSQLVarchar.java

This problem is  showing up particularly for like as we create an array of collationElementsForString  and cache it so if we
a like again we don't rebuild it. 

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-2670:
--------------------------------------

    Attachment: coll.java

Attaching the same repro as a Java application. In the Java application I replaced 'å' with '\u00E5' and soft hyphen with '\u00AD' (to make the repro independent of locale/terminal and to make it clearer which characters the strings contained).

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen closed DERBY-2670.
-------------------------------------


Thank you for opening a new issue, Mike. Since the reported problem is fixed, I'm closing this issue.

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>             Fix For: 10.3.0.0
>
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-2670:
----------------------------------


I have logged DERBY-2706 for the new problems you found knut.  My first guess is that this is not related to the previous problem, thus logged a new issue.  Of course won't really know until it is fixed.  If you come up with any other cases that 
you think would be worthwhile add them to that issue and hopefully we can eventually get them into a junit test. 

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>             Fix For: 10.3.0.0
>
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali updated DERBY-2670:
----------------------------------


thanks very much for the early testing.  And we appreciate the java repro it makes it a lot easier for us to demonstrate and debug the issue here.

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-2670:
--------------------------------------

    Attachment: collrepro.sql

Attaching ij script which reproduces the bug.

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Attachments: collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali reassigned DERBY-2670:
-------------------------------------

    Assignee: Mike Matrigali

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (DERBY-2670) %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-2670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mike Matrigali resolved DERBY-2670.
-----------------------------------

       Resolution: Fixed
    Fix Version/s: 10.3.0.0

This issue is fixed by the following checkin.  This was only a problem in new code introduced in the trunk and
never affected any Derby release.

Changed WorkHorseForCollatorDatatypes to not cache info about the collation
elements across calls to like on the object.  This info needed to get
invalidated when object for a column was reused to process the next row.
Added a test to CollationTest2 based on bug report.

m2_ibm15:123>svn commit

Sending        java\engine\org\apache\derby\iapi\types\WorkHorseForCollatorDatatypes.java
Sending        java\testing\org\apache\derbyTesting\functionTests\tests\lang\CollationTest2.java
Transmitting file data ..
Committed revision 541822.

> %a% and %aa% match too many rows in database with collation=TERRITORY_BASED and territory=no_NO
> -----------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2670
>                 URL: https://issues.apache.org/jira/browse/DERBY-2670
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.0.0
>            Reporter: Knut Anders Hatlen
>         Assigned To: Mike Matrigali
>             Fix For: 10.3.0.0
>
>         Attachments: coll.java, collrepro.sql
>
>
> On a database with territory=no_NO and collation=TERRITORY_BASED, I get these results from a query which uses the LIKE predicate:
> ij> select * from t where x like '%a%';
> X                   
> --------------------
> Waagan              
> Wåhan               
> Wanvik              
> Wågan               
> ekstrabetaling      
> ekstra­arbeid       
> ­a                  
> a                   
> -a                  
>  a                  
> B                   
> C                   
> 12 rows selected
> The last two rows ('B' and 'C') should not have been included in the result, since they do not match the pattern '%a%'.
> Similar wrong results are seen with '%aa%':
> ij> select * from t where x like '%aa%';
> X                   
> --------------------
> Waagan              
> ekstraarbeid        
> B                   
> 3 rows selected

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.