You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Paul Jackson (JIRA)" <ji...@apache.org> on 2018/01/10 19:32:00 UTC

[jira] [Created] (CALCITE-2130) Converting subquery to join is not always giving equivalent behavior

Paul Jackson created CALCITE-2130:
-------------------------------------

             Summary: Converting subquery to join is not always giving equivalent behavior
                 Key: CALCITE-2130
                 URL: https://issues.apache.org/jira/browse/CALCITE-2130
             Project: Calcite
          Issue Type: Bug
          Components: core
    Affects Versions: 1.15.0
            Reporter: Paul Jackson
            Assignee: Julian Hyde


Starting with this DDL:
{code:SQL}
create table friends (
  id int not null,
  friend_id int,
  primary key (id),
  foreign key (friend_id) references friends (id)
);

insert into friends values
(210, null),
(209, 210),
(202, 209),
(208, 202),
(207, 209),
(203, 207),
(201, null),
(204, null),
(205, null),
(206, 209);
{code}

This query looks up a row by id and performs another lookup for id by the friend id:
{code:SQL}
SELECT id,
 (SELECT id
  FROM friends as f2
  WHERE f2.friend_id = f1.friend_id) AS friend
FROM friends as f1
WHERE id = '203'

id	friend_id	friend
203	207	203
{code}

This query only returns a result for rows with unique values for friend_id. Replacing 203 with 202 leads to this error (MySql): {{Error Code: 1242. Subquery returns more than 1 row}}

I converted this query to a RelNode and then converted it back to SQL using the MySQL dialect:
{code:Java}
Planner aPlanner = Frameworks.getPlanner(aConfig);
SqlNode aQuery = aPlanner.parse(theSql);
aQuery = aPlanner.validate(aQuery);
RelNode aRelNode = aPlanner.rel(aQuery).project();
RelToSqlConverter aSqlConverter = new RelToSqlConverter(aSqlDialect);
SqlNode aSqlNode = aSqlConverter.visitChild(0, aRelNode).asStatement();
{code}

This gives the following plan and SQL:
{code}
LogicalProject(id=[$0], friend_id=[$1], friend=[$2])
  LogicalProject(id=[$0], friend_id=[$1], $f0=[$3])
    LogicalJoin(condition=[=($1, $2)], joinType=[left])
      LogicalFilter(condition=[=($0, '203')])
        JdbcTableScan(table=[[stardog, friends]])
      LogicalAggregate(group=[{0}], agg#0=[SINGLE_VALUE($1)])
        LogicalProject(friend_id=[$1], id=[$0])
          LogicalProject(id=[$0], friend_id=[$1])
            LogicalFilter(condition=[IS NOT NULL($1)])
              JdbcTableScan(table=[[stardog, friends]])

SELECT `t`.`id`, `t`.`friend_id`, `t2`.`$f1` AS `$f0`
FROM (SELECT *
FROM `stardog`.`friends`
WHERE `id` = '203') AS `t`
LEFT JOIN (SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL
UNION ALL
SELECT NULL) END AS `$f1`
FROM `stardog`.`friends`
WHERE `friend_id` IS NOT NULL
GROUP BY `friend_id`) AS `t2` ON `t`.`friend_id` = `t2`.`friend_id`
{code}

The MySQL implementation for SINGLE_VALUE is the CASE clause that causes an 1242 error when the id count is greater than 1 by invoking a UNION ALL on two NULL rows. In theory, this should return the ID when it is a unique value and throw an error when there are multiple. Instead, MySQL will return the 1242 error for all values of id, including 203.

Note, the JOIN subquery works if you add a WHERE clause expression to constrain the value of freind_id:
{code:SQL}
SELECT `friend_id`, CASE COUNT(`id`) WHEN 0 THEN NULL WHEN 1 THEN `id` ELSE (SELECT NULL
UNION ALL
SELECT NULL) END AS `$f1`
FROM `stardog`.`friends`
WHERE `friend_id` IS NOT NULL AND `friend_id` = '207'
GROUP BY `friend_id`
{code}

Substituting friend_id for 207 leads to the 1242 error, as intended.

This JOIN works on some dialects, but I think it is because different dialects can use different join implementations. If the join performs the JOIN ON SELECT clause without adding a where clause expression to constrain friend_id (collecting all rows with a non-null friend_id), it will encounter this error. Implementations that use a join algorithm that does constrain the friend_id to the desired value will experience the error only when there are multiple rows with the same friend_id, which behaves like the original query.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)