You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2016/03/07 23:56:40 UTC

[jira] [Comment Edited] (DRILL-4477) Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM

    [ https://issues.apache.org/jira/browse/DRILL-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15183749#comment-15183749 ] 

Victoria Markman edited comment on DRILL-4477 at 3/7/16 10:55 PM:
------------------------------------------------------------------

Same in 1.4.0:

{code}
0: jdbc:drill:drillbit=localhost> select * from sys.version;
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |                             commit_message                              |        commit_time         | build_email  |         build_time         |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.4.0    | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b  | MD-649: do the cost adjustment for $SUM0 only when LIMIT 0 is present.  | 07.01.2016 @ 18:19:31 UTC  | Unknown      | 07.01.2016 @ 19:52:20 UTC  |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (1.082 seconds)

0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c);
+----+----+----+-----+-----+-----+
| a  | b  | c  | a0  | b0  | c0  |
+----+----+----+-----+-----+-----+
| a  | b  | 1  | a   | b   | 1   |
+----+----+----+-----+-----+-----+
1 row selected (2.005 seconds)
{code}

Star is not the factor here either. I get wrong result even when I explicitly name columns:
{code}
0: jdbc:drill:drillbit=localhost> select a,b,c,d,e,f from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c) as t(a,b,c,d,e,f);
+----+----+----+----+----+----+
| a  | b  | c  | d  | e  | f  |
+----+----+----+----+----+----+
| a  | b  | 1  | a  | b  | 1  |
+----+----+----+----+----+----+
1 row selected (0.342 seconds)
{code}

I went all the way back to 1.0.0 to see if it reproduces there, because I was convinced that I've seen exactly the same problem before. It turns out that it reproduces there as well. Perplexing that we did not find it until now ... 


was (Author: vicky):
Same in 1.4.0:

{code}
0: jdbc:drill:drillbit=localhost> select * from sys.version;
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| version  |                 commit_id                 |                             commit_message                              |        commit_time         | build_email  |         build_time         |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
| 1.4.0    | 556d9c5dbf4b0f776b57f0d39d07111ccc0d2e7b  | MD-649: do the cost adjustment for $SUM0 only when LIMIT 0 is present.  | 07.01.2016 @ 18:19:31 UTC  | Unknown      | 07.01.2016 @ 19:52:20 UTC  |
+----------+-------------------------------------------+-------------------------------------------------------------------------+----------------------------+--------------+----------------------------+
1 row selected (1.082 seconds)

0: jdbc:drill:drillbit=localhost> select * from (select t1.a, t1.b, t1.c, t2.a, t2.b, t2.c from `tt1.json` t1, `tt2.json` t2 where t1.c = t2.c);
+----+----+----+-----+-----+-----+
| a  | b  | c  | a0  | b0  | c0  |
+----+----+----+-----+-----+-----+
| a  | b  | 1  | a   | b   | 1   |
+----+----+----+-----+-----+-----+
1 row selected (2.005 seconds)
{code}

> Wrong Plan (potentially wrong result) if wrapping a query with SELECT * FROM
> ----------------------------------------------------------------------------
>
>                 Key: DRILL-4477
>                 URL: https://issues.apache.org/jira/browse/DRILL-4477
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Sean Hsuan-Yi Chu
>            Assignee: Sean Hsuan-Yi Chu
>            Priority: Blocker
>             Fix For: 1.6.0
>
>         Attachments: t1.json, t2.json
>
>
> For example, a query  
> {code}
> select * from (select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v
> on (s.name = v.name) 
> where s.age < 30) t 
> {code}
> gives a plan as below:
> {code}
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$0], name0=[$0], registration=[$3])
> 00-03          Project(name=[$2], age=[$3], name0=[$0], registration=[$1])
> 00-04            HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-06              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-05              Project(name0=[$0], age=[$1])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}
> In the line 00-02, both name and name0 point at the same incoming column (probably due to the JOIN CONDITION). 
> However. the fact that these two are the JOIN condition does not make a case that they must be equal since implicit casting might be invoked to perform the JOIN condition.
> Interestingly, if the SELECT * FROM wrapper is removed, this bug won't be exposed: 
> {code}
> select s.name, v.name, v.registration from cp.`tpch/region.parquet` s left outer join cp.`tpch/nation.parquet` v on (s.name = v.name) 
> where s.age < 30
> {code}
> gives 
> {code}
> 00-00    Screen
> 00-01      Project(name=[$0], name0=[$1], registration=[$2])
> 00-02        Project(name=[$2], name0=[$0], registration=[$1])
> 00-03          HashJoin(condition=[=($2, $0)], joinType=[right])
> 00-05            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `registration`]]])
> 00-04            Project(name0=[$0])
> 00-06              Project(name=[$0])
> 00-07                SelectionVectorRemover
> 00-08                  Filter(condition=[<($1, 30)])
> 00-09                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, usedMetadataFile=false, columns=[`name`, `age`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)