You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Atri Sharma (JIRA)" <ji...@apache.org> on 2018/05/21 06:23:00 UTC

[jira] [Commented] (CALCITE-2130) Converting subquery to join is not always giving equivalent behavior

    [ https://issues.apache.org/jira/browse/CALCITE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16482226#comment-16482226 ] 

Atri Sharma commented on CALCITE-2130:
--------------------------------------

[~julianhyde] I will like to take a crack at this one, if that is fine

> Converting subquery to join is not always giving equivalent behavior
> --------------------------------------------------------------------
>
>                 Key: CALCITE-2130
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2130
>             Project: Calcite
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.15.0
>            Reporter: Paul Jackson
>            Assignee: Julian Hyde
>            Priority: Major
>
> Starting with this DDL:
> {code:SQL}
> create table friends (
>   id int not null,
>   friend_id int,
>   primary key (id),
>   foreign key (friend_id) references friends (id)
> );
> insert into friends values
> (210, null),
> (209, 210),
> (202, 209),
> (208, 202),
> (207, 209),
> (203, 207),
> (201, null),
> (204, null),
> (205, null),
> (206, 209);
> {code}
> This query looks up a row by id and performs another lookup for id by the friend id:
> {code:SQL}
> SELECT id,
>  (SELECT id
>   FROM friends as f2
>   WHERE f2.friend_id = f1.friend_id) AS friend
> FROM friends as f1
> WHERE id = '203'
> id	friend_id	friend
> 203	207	203
> {code}
> This query only returns a result for rows with unique values for friend_id. Replacing 203 with 202 leads to this error (MySql): {{Error Code: 1242. Subquery returns more than 1 row}}
> I converted this query to a RelNode and then converted it back to SQL using the MySQL dialect:
> {code:Java}
> Planner aPlanner = Frameworks.getPlanner(aConfig);
> SqlNode aQuery = aPlanner.parse(theSql);
> aQuery = aPlanner.validate(aQuery);
> RelNode aRelNode = aPlanner.rel(aQuery).project();
> RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
> SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();
> {code}
> This gives the following plan and SQL:
> {code}
> LogicalProject(id=[$0], friend_id=[$1], friend=[$2])
>   LogicalProject(id=[$0], friend_id=[$1], $f0=[$3])
>     LogicalJoin(condition=[=($1, $2)], joinType=[left])
>       LogicalFilter(condition=[=($0, '203')])
>         JdbcTableScan(table=[[stardog, friends]])
>       LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)])
>         LogicalProject(friend_id=[$1], id=[$0])
>           LogicalProject(id=[$0], friend_id=[$1])
>             LogicalFilter(condition=[IS NOT NULL($1)])
>               JdbcTableScan(table=[[stardog, friends]])
> SELECT `t`.`id`, `t`.`friend_id`, `t2`.`$f1` AS `$f0`
> FROM (SELECT *
> FROM `stardog`.`friends`
> WHERE `id` = '203') AS `t`
> LEFT JOIN (SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL
> UNION ALL
> SELECT NULL) END AS `$f1`
> FROM `stardog`.`friends`
> WHERE `friend_id` IS NOT NULL
> GROUP BY `friend_id`) AS `t2` ON `t`.`friend_id` = `t2`.`friend_id`
> {code}
> The MySQL implementation for SINGLE_VALUE is the CASE clause that causes an 1242 error when the id count is greater than 1 by invoking a UNION ALL on two NULL rows. In theory, this should return the ID when it is a unique value and throw an error when there are multiple. Instead, MySQL will return the 1242 error for all values of id, including 203.
> Note, the JOIN subquery works if you add a WHERE clause expression to constrain the value of freind_id:
> {code:SQL}
> SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL
> UNION ALL
> SELECT NULL) END AS `$f1`
> FROM `stardog`.`friends`
> WHERE `friend_id` IS NOT NULL AND `friend_id` = '207'
> GROUP BY `friend_id`
> {code}
> Substituting friend_id for 207 leads to the 1242 error, as intended.
> This JOIN works on some dialects, but I think it is because different dialects can use different join implementations. If the join performs the JOIN ON SELECT clause without adding a where clause expression to constrain friend_id (collecting all rows with a non-null friend_id), it will encounter this error. Implementations that use a join algorithm that does constrain the friend_id to the desired value will experience the error only when there are multiple rows with the same friend_id, which behaves like the original query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)