You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2014/06/18 15:43:02 UTC

[jira] [Updated] (DERBY-6227) Distinct aggregates don't work well with territory-based collation

     [ https://issues.apache.org/jira/browse/DERBY-6227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-6227:
--------------------------------------

    Attachment: d6227-1a.diff

I think the fix for this bug is to make the list of distinct values hold DataValueDescriptors instead of java.lang.String objects, since the DataValueDescriptor classes implement equals() in a way that takes the database collation into consideration.

The attached patch [^d6227-1a.diff] changes GroupedAggregateResultSet so that it does that. All regression tests ran cleanly with that patch, including a new test case for this bug.

I wasn't quite sure if it was safe to store the DVDs directly in the list of distinct values, or if they would need to be cloned first (because some result sets reuse old DVDs when reading new rows, and we don't want the values in the list to change). I concluded that it was safe to store them directly, because distinct aggregates always take their rows from a sorter and not directly from the result set, and the sorter returns clones of the original DVDs. I added an assert that verifies that the row is read from the sorter, and a comment that says we should check ResultSet.needsToClone() if we ever start reading directly from the result set when processing distinct aggregates.

> Distinct aggregates don't work well with territory-based collation
> ------------------------------------------------------------------
>
>                 Key: DERBY-6227
>                 URL: https://issues.apache.org/jira/browse/DERBY-6227
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0, 10.6.2.1, 10.7.1.1, 10.8.1.2, 10.8.2.2, 10.8.3.0, 10.9.1.0, 10.10.1.1
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>              Labels: derby_triage10_11
>         Attachments: d6227-1a.diff
>
>
>  When working on DERBY-5840, I noticed that GroupedAggregateResultSet would do duplicate elimination by comparing the java.lang.String representation of the values. With territory-based collation, it is possible that two values that have different java.lang.String representation should be considered duplicates, and this logic will produce incorrect results.
> Example:
> ij version 10.10
> ij> connect 'jdbc:derby:memory:db;territory=en_US;collation=TERRITORY_BASED:PRIMARY;create=true';
> ij> create table t(i int, s varchar(10));
> 0 rows inserted/updated/deleted
> ij> insert into t values (1, 'a'), (1, 'a'), (2, 'b'), (2, 'B'), (3, 'a'), (3, 'A'), (3, 'b'), (3, 'B'), (3, 'c');
> 9 rows inserted/updated/deleted
> ij> select distinct s from t;
> S         
> ----------
> b         
> a         
> c         
> 3 rows selected
> ij> select i, count(distinct s) from t group by i;
> I          |2          
> -----------------------
> 1          |1          
> 2          |2          
> 3          |5          
> 3 rows selected
> I would have expected the last query to return
> (1, 1)
> (2, 1)
> (3, 3)



--
This message was sent by Atlassian JIRA
(v6.2#6252)