You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Mamta A. Satoor (Commented) (JIRA)" <ji...@apache.org> on 2011/12/07 06:48:40 UTC
[jira] [Commented] (DERBY-4631) Wrong join column returned by right outer join with NATURAL or USING and territory-based collation

    [ https://issues.apache.org/jira/browse/DERBY-4631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13164165#comment-13164165 ] 

Mamta A. Satoor commented on DERBY-4631:
----------------------------------------

I debugged the code to figure out what (and where) are we doing in the code which causes us to give wrong results for join column in case of territory based database and right outer join with NATURAL or USING clause. As Knut pointed out earlier in this jira, as per the SQL spec, "the join columns in a natural join or in a named columns join should be added to the select list by coalescing the column from the left table with the column from the right table. " What I have found is that Derby decides to pick up join column's value from the left table when we are working with natural left outer join and it picks up the join column's value from the right table when we are working with natural right outer join. This is not a problem when we are dealing with non-territory based databases but the assumption to rely on just one table's join column is incorrect when working with territory based databases. Following is the test case I used for debugging which further explains Derby's current implementation.

connect 'jdbc:derby:db1;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY'; 
create table big(x varchar(5)); 
insert into big values 'A','B', null; 
create table small(x varchar(5)); 
insert into small values 'b','c', null; 
select * from small t1 natural left outer join big t2;
select * from small t1 natural right outer join big t2;

For both natural left outer join and natural right outer join, at execution time, we create a merged row which has columns merged from the left and right tables. The column(in my example, there is only one column)s in the select sql maps to a column from the merged row. The mapping is determined at the sql compile phase. 
In the case of 'select * from small t1 natural left outer join big t2', there will be three merged rows with 2 columns each
'b'  'B'
'c'  null
null null
And for natural left outer join, the generated code has column in the select SQL map to the first column in the merged row. This will always work fine even in a territory based database because as per the SQL standards, the column x should be equivalent to the return value of coalesce(t1.x, t2.x). Since we are working with left outer join, then if the first column in the merged row is null, then even the 2nd column(from the right table) will be null and hence it is ok to always pick up the value from the 1st column in the merged row. This mapping will always lead column x to have the same value as coalesce(t1.x, t2.x).

But for a territory based database, we can't count on a logic like that for natural right outer join. The way Derby works right now, the column x in the select always gets mapped to the 2nd column in the merged row. In the case of 'select * from small t1 natural right outer join big t2', there will be three merged rows with 2 columns each
null 'A'
'b'  'B'
null null
And for natural right outer join, the generated code has column in the select SQL map to the second column in the merged row. This will work fine in a non-territory database, because if column 1 in the merged row has a non-null value, then it will always be the same value as the column 2 in the merged row. But in our example, with territor based database(with SECONDARY strength, meaning it is case insensitive comparison), values 'B' and 'b' are considered equal. Hence the coalesce(t1.x, t2,x) will not be same as value in the 2nd column of the merged row. For natural right outer join with the data given in the example above, 
coalesce(t1.x, t2,x) will return 'A', 'b' and null. But with the mapping of column x in the SELECT to the 2nd column in the merged row will return 'A', 'B' and null thus returning data which does not comply with SQL standard which says that column x's value should be the return value of coalesce(t1.x, t2.x). So it seems like may be we need some of kind projection in case of natural right outer join (rather than simple column mapping to the 2nd column which is what happens right now) so that we look at both the columns in the merged row to determine the value of column x.

Hope this explanation helps understand what Derby is doing internally and based on that, we can come up with some proposal to fix the issue.

                
> Wrong join column returned by right outer join with NATURAL or USING and territory-based collation
> --------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-4631
>                 URL: https://issues.apache.org/jira/browse/DERBY-4631
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.6.1.0
>            Reporter: Knut Anders Hatlen
>              Labels: derby_triage10_8
>
> SQL:2003 says that the join columns in a natural join or in a named
> columns join should be added to the select list by coalescing the
> column from the left table with the column from the right table.
> Section 7.7, <joined table>, syntax rules:
> > 1) Let TR1 be the first <table reference>, and let TR2 be the <table
> > reference> or <table factor> that is the second operand of the
> > <joined table>. Let RT1 and RT2 be the row types of TR1 and TR2,
> > respectively. Let TA and TB be the range variables of TR1 and TR2,
> > respectively. (...)
> and
> > 7) If NATURAL is specified or if a <join specification> immediately
> > containing a <named columns join> is specified, then:
> (...)
> > d) If there is at least one corresponding join column, then let SLCC
> > be a <select list> of <derived column>s of the form
> >
> > COALESCE ( TA.C, TB.C ) AS C
> >
> > for every column C that is a corresponding join column, taken in
> > order of their ordinal positions in RT1.
> For a right outer join, Derby doesn't use COALESCE(TA.C, TB.C), but
> rather just TB.C (the column in the right table) directly.
> This is in most cases OK, because COALESCE(TA.C, TB.C) = TB.C is an
> invariant in a right outer join. (Because TA.C is either NULL or equal
> to TB.C.)
> However, in a database with territory-based collation, equality
> between two values does not mean they are identical, especially now
> that the strength of the collator can be specified (DERBY-1748).
> Take for instance this join:
> ij> connect 'jdbc:derby:testdb;create=true;territory=en_US;collation=TERRITORY_BASED:SECONDARY';
> ij> create table big(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into big values 'A','B','C';
> 3 rows inserted/updated/deleted
> ij> create table small(x varchar(5));
> 0 rows inserted/updated/deleted
> ij> insert into small values 'b','c','d';
> 3 rows inserted/updated/deleted
> ij> select x, t1.x, t2.x, coalesce(t1.x, t2.x) from small t1 natural right outer join big t2;
> X    |X    |X    |4    
> -----------------------
> A    |NULL |A    |A    
> B    |b    |B    |b    
> C    |c    |C    |c    
> 3 rows selected
> I believe that the expected result from the above query is that the
> first column should have the same values as the last column. That is,
> the first column should contain {'A', 'b', 'c'}, not {'A', 'B', 'C'}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira