You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2008/12/05 17:38:44 UTC

[jira] Created: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

SELECT DISTINCT may return duplicates with territory-based collation
--------------------------------------------------------------------

                 Key: DERBY-3975
                 URL: https://issues.apache.org/jira/browse/DERBY-3975
             Project: Derby
          Issue Type: Bug
          Components: SQL
    Affects Versions: 10.4.2.0
            Reporter: Knut Anders Hatlen


I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:

ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
ij> create table t (x varchar(10));
0 rows inserted/updated/deleted
ij> insert into t values 'a','b','abba','baab','ABBA';
5 rows inserted/updated/deleted
ij> select distinct * from t;
X         
----------
ABBA      
b         
a         
abba      

4 rows selected
ij> select distinct * from t order by x;
X         
----------
a         
abba      
ABBA      

3 rows selected

The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor resolved DERBY-3975.
------------------------------------

       Resolution: Fixed
    Fix Version/s: 10.3.3.1

Merged the fix to the 10.3 branch(had to do some manual tweaking to the CollationTest) and committed revision 733094. 


> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.0.0
>
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Reopened: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mamta A. Satoor reopened DERBY-3975:
------------------------------------


Reopening the issue to merge the changes into 10.3 codeline

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.4.2.1, 10.5.0.0
>
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen closed DERBY-3975.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 10.4.2.1

Merged the fix to the 10.4 branch and committed revision 730163.

The patch didn't merge cleanly to 10.3 because of conflicts in CollationTest. If someone wants to do the manual back-port to 10.3, we can reopen the issue later, but I'm marking it as resolved and closing it for now.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.4.2.1, 10.5.0.0
>
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen closed DERBY-3975.
-------------------------------------


> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.3.3.1, 10.4.2.1, 10.5.0.0
>
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654806#action_12654806 ] 

Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

Fixing this is probably as simple as overloading hashCode() in the CollatorSQL* classes with this implementation (perhaps with some special handling of null):

public int hashCode() {
    return getCollationKey().hashCode();
}

The CollatorSQL* classes use CollationKey.compareTo() to implement Comparable.compareTo() and Object.equals(), so using CollationKey.hashCode() should give the required consistency between compareTo(), equals() and hashCode().

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

    Affects Version/s: 10.3.1.4

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

    Attachment: derby-3975-1a.stat
                derby-3975-1a.diff

The attached patch attempts to fix the problem by implementing a hashCode() method in CollatorSQLChar, CollatorSQLVarchar, CollatorSQLLongvarchar and CollatorSQLClob based on CollationKey.hashCode(). It also extends CollationTest.compareAgrave() with a test case for SELECT DISTINCT, and makes it test both CHAR and VARCHAR (previously it only tested VARCHAR). CollationTest fails without the fix and passes with the fix. The test is based on the fact that in the French locale, À (Unicode code point 00C0) is the same as À ('A' + Unicode code point 0300), whereas they are different in UCS_BASIC.

I will start the regression tests now.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen reassigned DERBY-3975:
-----------------------------------------

    Assignee: Knut Anders Hatlen

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654469#action_12654469 ] 

Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

Just a guess, but I think the problem might be that o.a.d.iapi.types.CollatorSQLVarchar doesn't override SQLChar.hashCode().

The reason why the abba/baab duplicate was removed, is that SQLChar.hashCode() just adds up the char values ignoring the positions, so the two strings have the same hash code because they contain the exact same characters (though in different order). Not necessary for this issue, but it would probably be better to calculate with a formula similar to the one described here <URL:http://java.sun.com/javase/6/docs/api/java/lang/String.html#hashCode()>. Just adding the char values will give a poor distribution and the higher bits won't be used unless the string is very long.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654810#action_12654810 ] 

Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

I don't think we store the hash values anywhere, so to my knowledge they just have to be consistent within the lifetime of the JVM. The only place I know that the hash values will go to disk, is when a BackingStoreHashtable spills to disk, but that's just in a temporary file that will be deleted on the next boot, as far as I know.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

       Derby Info:   (was: [Patch Available])
    Fix Version/s: 10.5.0.0

Committed revision 728822.

I'll keep the issue open until the fix has been back-ported to 10.4 and 10.3.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>             Fix For: 10.5.0.0
>
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657373#action_12657373 ] 

Knut Anders Hatlen commented on DERBY-3975:
-------------------------------------------

One thing I forgot to mention in the description of the patch:

The patch changes the hashCode() implementation in CollatorSQL{Char,Varchar,Longvarchar,Clob}, but it only tests CHAR and VARCHAR. That's because LONG VARCHAR and CLOB columns cannot be used in distinct queries without casting them to another data type first, and since you cannot compare two such columns with =, you cannot perform a hash join to test it either. I'm not aware of any other code than distinct scans and hash scans that will ever call the hashCode() methods of these objects, so I don't know of any way to test those two data types.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

    Attachment: AisBCollatorProvider.java

Here's the CollatorProvider I used to define the locale. To use it, do the following:

$ mkdir -p META-INF/services
$ javac AisBCollatorProvider.java
$ echo AisBCollatorProvider > META-INF/services/java.text.spi.CollatorProvider
$ jar cf aisb-collator.jar AisBCollatorProvider.class META-INF
$ java -Djava.ext.dirs=. -jar /path/to/derbyrun.jar ij

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654807#action_12654807 ] 

Bryan Pendleton commented on DERBY-3975:
----------------------------------------

What are the upgrade implications of changing the hash code implementations of a data type?


> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (DERBY-3975) SELECT DISTINCT may return duplicates with territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/DERBY-3975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-3975:
--------------------------------------

    Derby Info: [Patch Available]

All the regression tests passed.

> SELECT DISTINCT may return duplicates with territory-based collation
> --------------------------------------------------------------------
>
>                 Key: DERBY-3975
>                 URL: https://issues.apache.org/jira/browse/DERBY-3975
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.4.2.0
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>         Attachments: AisBCollatorProvider.java, derby-3975-1a.diff, derby-3975-1a.stat
>
>
> I defined my own locale (en_US_aisb) where the collation rules said that a=b. When I tried queries with SELECT DISTINCT, they didn't always eliminate all duplicates. Here's an example:
> ij> connect 'jdbc:derby:db;create=true;territory=en_US_aisb;collation=TERRITORY_BASED';
> ij> create table t (x varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values 'a','b','abba','baab','ABBA';
> 5 rows inserted/updated/deleted
> ij> select distinct * from t;
> X         
> ----------
> ABBA      
> b         
> a         
> abba      
> 4 rows selected
> ij> select distinct * from t order by x;
> X         
> ----------
> a         
> abba      
> ABBA      
> 3 rows selected
> The first query did eliminate the duplicate "abba"/"baab", but it did not eliminate the duplicate "a"/"b". When an ORDER BY clause was added (the second query), all the duplicates were eliminated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.