You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/01/26 19:05:34 UTC
[jira] [Updated] (DRILL-2069) Star is not expanded correctly in the query with IN clause containing subquery

     [ https://issues.apache.org/jira/browse/DRILL-2069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victoria Markman updated DRILL-2069:
------------------------------------
    Description: 
t1.json
{code}
{ "a1": "aa", "b1": 1 }
{ "a1": "bb", "b1": 2 }
{ "a1": "cc", "b1": 3 }
{code}

t2.json
{code}
{ "a2": "aa", "b2": 1 }
{ "a2": "bb", "b2": 2 }
{ "a2": "xx", "b2": 10 }
{code}

Star is expanded incorrectly, we should get only columns from `t1.json`
{code}
0: jdbc:drill:schema=dfs> select * from `t1.json` where a1 in (select a2 from `t2.json`);
+------------+------------+------------+------------+
|     a2     |     a1     |     b1     |    a10     |
+------------+------------+------------+------------+
| aa         | aa         | 1          | aa         |
| bb         | bb         | 2          | bb         |
+------------+------------+------------+------------+
2 rows selected (0.172 seconds)
{code}

explain plan
{code}
00-01      Project(*=[$0])
00-02        Project(*=[$0])
00-03          HashJoin(condition=[=($1, $2)], joinType=[inner])
00-05            Project(*=[$0], a1=[$1])
00-07              Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]])
00-04            HashAgg(group=[{0}])
00-06              Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]])
{code}

Workaround - specify columns explicitly
{code}
0: jdbc:drill:schema=dfs> select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2);
+------------+------------+
|     a1     |    a10     |
+------------+------------+
| aa         | aa         |
| bb         | bb         |
+------------+------------+
2 rows selected (0.24 seconds)
{code}

Note to myself: include cases like below during verification:
{code}
0: jdbc:drill:schema=dfs> select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`);
+------------+------------+------------+------------+------------+------------+
|     a2     |     b2     |     a1     |     b1     |    a10     |    b10     |
+------------+------------+------------+------------+------------+------------+
| aa         | 1          | aa         | 1          | aa         | 1          |
| bb         | 2          | bb         | 2          | bb         | 2          |
+------------+------------+------------+------------+------------+------------+
2 rows selected (0.323 seconds)

0: jdbc:drill:schema=dfs> select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`);
Query failed: SqlValidatorException: Values passed to IN operator must have compatible types
Error: exception while executing query: Failure while executing query. (state=,code=0)
{code}

  was:
t1.json
{code}
{ "a1": "aa", "b1": 1 }
{ "a1": "bb", "b1": 2 }
{ "a1": "cc", "b1": 3 }
{code}

t2.json
{code}
{ "a2": "aa", "b2": 1 }
{ "a2": "bb", "b2": 2 }
{ "a2": "xx", "b2": 10 }
{code}

Star is expanded incorrectly, we should get only columns from `t1.json`
{code}
0: jdbc:drill:schema=dfs> select * from `t1.json` where a1 in (select a2 from `t2.json`);
+------------+------------+------------+------------+
|     a2     |     a1     |     b1     |    a10     |
+------------+------------+------------+------------+
| aa         | aa         | 1          | aa         |
| bb         | bb         | 2          | bb         |
+------------+------------+------------+------------+
2 rows selected (0.172 seconds)
{code}

explain plan
{code}
00-01      Project(*=[$0])
00-02        Project(*=[$0])
00-03          HashJoin(condition=[=($1, $2)], joinType=[inner])
00-05            Project(*=[$0], a1=[$1])
00-07              Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]])
00-04            HashAgg(group=[{0}])
00-06              Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]])
{code}

Workaround - specify columns explicitly
{code}
0: jdbc:drill:schema=dfs> select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2);
+------------+------------+
|     a1     |    a10     |
+------------+------------+
| aa         | aa         |
| bb         | bb         |
+------------+------------+
2 rows selected (0.24 seconds)
{code}



> Star is not expanded correctly in the query with IN clause containing subquery
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-2069
>                 URL: https://issues.apache.org/jira/browse/DRILL-2069
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>
> t1.json
> {code}
> { "a1": "aa", "b1": 1 }
> { "a1": "bb", "b1": 2 }
> { "a1": "cc", "b1": 3 }
> {code}
> t2.json
> {code}
> { "a2": "aa", "b2": 1 }
> { "a2": "bb", "b2": 2 }
> { "a2": "xx", "b2": 10 }
> {code}
> Star is expanded incorrectly, we should get only columns from `t1.json`
> {code}
> 0: jdbc:drill:schema=dfs> select * from `t1.json` where a1 in (select a2 from `t2.json`);
> +------------+------------+------------+------------+
> |     a2     |     a1     |     b1     |    a10     |
> +------------+------------+------------+------------+
> | aa         | aa         | 1          | aa         |
> | bb         | bb         | 2          | bb         |
> +------------+------------+------------+------------+
> 2 rows selected (0.172 seconds)
> {code}
> explain plan
> {code}
> 00-01      Project(*=[$0])
> 00-02        Project(*=[$0])
> 00-03          HashJoin(condition=[=($1, $2)], joinType=[inner])
> 00-05            Project(*=[$0], a1=[$1])
> 00-07              Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t1.json, numFiles=1, columns=[`*`], files=[maprfs:/test/t1.json]]])
> 00-04            HashAgg(group=[{0}])
> 00-06              Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t2.json, numFiles=1, columns=[`a2`], files=[maprfs:/test/t2.json]]])
> {code}
> Workaround - specify columns explicitly
> {code}
> 0: jdbc:drill:schema=dfs> select t1.a1, t1.a1 from `t1.json` t1 where t1.a1 in (select t2.a2 from `t2.json` t2);
> +------------+------------+
> |     a1     |    a10     |
> +------------+------------+
> | aa         | aa         |
> | bb         | bb         |
> +------------+------------+
> 2 rows selected (0.24 seconds)
> {code}
> Note to myself: include cases like below during verification:
> {code}
> 0: jdbc:drill:schema=dfs> select * from `t1.json` t1 where (a1, b1) in (select a2, b2 from `t2.json`);
> +------------+------------+------------+------------+------------+------------+
> |     a2     |     b2     |     a1     |     b1     |    a10     |    b10     |
> +------------+------------+------------+------------+------------+------------+
> | aa         | 1          | aa         | 1          | aa         | 1          |
> | bb         | 2          | bb         | 2          | bb         | 2          |
> +------------+------------+------------+------------+------------+------------+
> 2 rows selected (0.323 seconds)
> 0: jdbc:drill:schema=dfs> select * from `t1.json` t1 where (a1, b1) in (select * from `t2.json`);
> Query failed: SqlValidatorException: Values passed to IN operator must have compatible types
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)