You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/05/22 20:01:17 UTC
[jira] [Comment Edited] (DRILL-2988) Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified

    [ https://issues.apache.org/jira/browse/DRILL-2988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14533690#comment-14533690 ] 

Victoria Markman edited comment on DRILL-2988 at 5/22/15 6:00 PM:
------------------------------------------------------------------

There is one more case that is similar to this one, with the difference that we throw an error in this case:

Notice, that columns in join condition are not qualified:
{code}
0: jdbc:drill:schema=dfs> select * from t1 where a1 > (select avg(a2)*100 from t2 where c1 = c2);
Query failed: SYSTEM ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
[991f5544-d44f-4ba2-aa70-783a88a5b16a on atsqa4-133.qa.lab:31010]
Error: exception while executing query: Failure while executing query. (state=,code=0)

Correlated scalar aggregation without expression:

0: jdbc:drill:schema=dfs> select * from t1 where a1 > (select avg(a2) from t2 where c1 = c2);
Query failed: SYSTEM ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join
[f238c0c0-efea-4684-8ec3-899fbcc2a075 on atsqa4-133.qa.lab:31010]
Error: exception while executing query: Failure while executing query. (state=,code=0)
{code}


was (Author: vicky):
There is one more case that is similar to this one, with the difference that we throw and error in this case:

Notice, that columns in join condition are not qualified:
{code}
0: jdbc:drill:schema=dfs> select * from t1 where a1 > (select avg(a2)*100 from t2 where c1 = c2);
Query failed: SYSTEM ERROR: Non-scalar sub-query used in an expression
See Apache Drill JIRA: DRILL-1937
[991f5544-d44f-4ba2-aa70-783a88a5b16a on atsqa4-133.qa.lab:31010]
Error: exception while executing query: Failure while executing query. (state=,code=0)

Correlated scalar aggregation without expression:

0: jdbc:drill:schema=dfs> select * from t1 where a1 > (select avg(a2) from t2 where c1 = c2);
Query failed: SYSTEM ERROR: This query cannot be planned possibly due to either a cartesian join or an inequality join
[f238c0c0-efea-4684-8ec3-899fbcc2a075 on atsqa4-133.qa.lab:31010]
Error: exception while executing query: Failure while executing query. (state=,code=0)
{code}

> Correlated exists subquery returns wrong result if join columns in subquery are not fully qualified
> ---------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2988
>                 URL: https://issues.apache.org/jira/browse/DRILL-2988
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>              Labels: document_if_not_fixed
>
> Two parquet tables:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.109 seconds)
> 0: jdbc:drill:schema=dfs> select * from t3;
> +------------+------------+------------+
> |     a3     |     b3     |     c3     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> +------------+------------+------------+
> 3 rows selected (0.106 seconds)
> {code}
> When column name is not qualified with the table name in the join condition, we get wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 where a1 = a3);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.177 seconds)
> {code}
> Works correctly when column names are fully qualified ...
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1 where exists (select * from t3 where t1.a1 = t3.a3);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> +------------+------------+------------+
> 3 rows selected (0.353 seconds)
> {code}
> Plan for the query with the wrong result:
> {code}
> 0: jdbc:drill:schema=dfs> explain plan for select * from t1 where exists (select * from t3 where a1 = a3);
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T124¦¦*=[$0])
> 00-03          NestedLoopJoin(condition=[true], joinType=[inner])
> 00-05            Project(T124¦¦*=[$0])
> 00-07              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`*`]]])
> 00-04            SelectionVectorRemover
> 00-06              Filter(condition=[IS NOT NULL($0)])
> 00-08                StreamAgg(group=[{}], agg#0=[MIN($0)])
> 00-09                  Project($f0=[true])
> 00-10                    SelectionVectorRemover
> 00-11                      Filter(condition=[=($0, $1)])
> 00-12                        Project(a1=[$1], a3=[$0])
> 00-13                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t3]], selectionRoot=/drill/testdata/aggregation/t3, numFiles=1, columns=[`a1`, `a3`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)