You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Knut Anders Hatlen (JIRA)" <ji...@apache.org> on 2010/04/26 15:37:32 UTC

[jira] Created: (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Wrong join column returned by right outer join with NATURAL or USING and territory-based collation
--------------------------------------------------------------------------------------------------

                 Key: DERBY-4631
                 URL: https://issues.apache.org/jira/browse/DERBY-4631
             Project: Derby
          Issue Type: Bug
          Components: SQL
    Affects Versions: 10.6.0.0
            Reporter: Knut Anders Hatlen


SQL:2003 says that the join columns in a natural join or in a named
columns join should be added to the select list by coalescing the
column from the left table with the column from the right table.

Section 7.7, <joined table>, syntax rules:

> 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> reference> or <table factor> that is the second operand of the
> <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> respectively. Let TA and TB be the range variables of TR1 and TR2,
> respectively. (...)

and

> 7) If NATURAL is specified or if a <join specification> immediately
> containing a <named columns join> is specified, then:
(...)
> d) If there is at least one corresponding join column, then let SLCC
> be a <select list> of <derived column>s of the form
>
> COALESCE ( TA.C, TB.C ) AS C
>
> for every column C that is a corresponding join column, taken in
> order of their ordinal positions in RT1.

For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
rather just TB.C (the column in the right table) directly.

This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
invariant in a right outer join. (Because TA.C is either NULL or equal
to TB.C.)

However, in a database with territory-based collation, equality
between two values does not mean they are identical, especially now
that the strength of the collator can be specified (DERBY-1748).

Take for instance this join:

ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
ij> create table big(x varchar(5));
0 rows inserted/updated/deleted
ij> insert into big values 'A','B','C';
3 rows inserted/updated/deleted
ij> create table small(x varchar(5));
0 rows inserted/updated/deleted
ij> insert into small values 'b','c','d';
3 rows inserted/updated/deleted
ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
X    |X    |X    |4    
-----------------------
A    |NULL |A    |A    
B    |b    |B    |b    
C    |c    |C    |c    

3 rows selected

I believe that the expected result from the above query is that the
first column should have the same values as the last column. That is,
the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-4631:
--------------------------------------

    Bug behavior facts: [Deviation from standard, Wrong query result]  (was: [Deviation from standard])

> Wrong join column returned by right outer join with NATURAL or USING and territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Posted by "Mamta A. Satoor (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890350#action_12890350 ] 

Mamta A. Satoor commented on DERBY-4631:
----------------------------------------

It seems like what we need is to replace the join column references(which are not tied to the join table specifically) in select column list to COALESCE functions. Additionally, in case of a named join order, the join column names in USING clause should be replaced with COALESCE function. eg of each of the cases
select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2; 
The first column in the select above which is x should be internally replaced with coalesce(t1.x, t2.x) 

SELECT country,count(country) FROM COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country 
The first column in the select and the COUNTRY in USING clause should be replaced with coalesce internally.

I looked at the code a bindExpressions code in impl.sql.compile.JoinNode There, we already have a code to recognize natural join
        if (naturalJoin) {
            usingClause = getCommonColumnsForNaturalJoin();
        }

I think after this code, we should go through join columns in select column list and in USING clause and replace them with corresponding COALESEC functions.

> Wrong join column returned by right outer join with NATURAL or USING and territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864741#action_12864741 ] 

Knut Anders Hatlen commented on DERBY-4631:
-------------------------------------------

This problem also makes us accept some statements that should not be accepted. For example:

SELECT country,count(country) FROM
COUNTRIES JOIN CITIES USING (COUNTRY) group by countries.country

Here, the query should be rejected because countries.country is not specified in the select list. However, it's accepted because we replace the column country with countries.country instead of coalesce(countries.country, cities.country).

> Wrong join column returned by right outer join with NATURAL or USING and territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.