You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2017/07/20 21:54:00 UTC

[jira] [Commented] (DRILL-5681) Incorrect query result when query uses star and correlated subquery

    [ https://issues.apache.org/jira/browse/DRILL-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095433#comment-16095433 ] 

Jinfeng Ni commented on DRILL-5681:
-----------------------------------

The incorrect query result was caused by a wrong query plan, when Drill planner is doing the decorrelation logic. 

For Q1, here is the plan
{code}
00-00    Screen
00-01      ProjectAllowDup(n_nationkey=[$0], n_name=[$1])
00-02        Project(n_nationkey=[ITEM($0, 'n_nationkey')], n_name=[ITEM($0, 'n_name')])
00-03          SelectionVectorRemover
00-04            Filter(condition=[NOT(IS NOT NULL($2))])
00-05              HashJoin(condition=[=($0, $1)], joinType=[left])
00-07                Project(T28¦¦*=[$0])
00-09                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
00-06                HashAgg(group=[{0}], agg#0=[MIN($1)])
00-08                  Project(T29¦¦*=[$1], $f0=[true])
00-10                    HashJoin(condition=[=($0, $2)], joinType=[inner])
00-12                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, usedMetadataFile=false, columns=[`r_regionkey`]]])
00-11                      Project(T29¦¦*=[$0], $f1=[ITEM($0, 'n_regionkey')])
00-13                        HashAgg(group=[{0}])
00-14                          Project(T29¦¦*=[$0])
00-15                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
{code}

Notice that operator 13 (HashAgg) is doing group by over * column, which is wrong. 


> Incorrect query result when query uses star and correlated subquery
> -------------------------------------------------------------------
>
>                 Key: DRILL-5681
>                 URL: https://issues.apache.org/jira/browse/DRILL-5681
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Jinfeng Ni
>
> The following repo was based on a testcase provided by Arjun Rajan(arajan@mapr.com). 
> Drill returns incorrect query result, when the query has a correlated subquery and querying against a view defined with select *, or querying a subquery with select *.  
> Case 1: Querying view with select * + correlated subquery
> {code}
> create view dfs.tmp.region_view as select * from cp.`tpch/region.parquet`;
> {code}
> //Q1 :  return 25 rows. The correct answer is 0 row. 
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+-----------------+
> | n_nationkey  |     n_name      |
> +--------------+-----------------+
> | 0            | ALGERIA         |
> | 1            | ARGENTINA       |
> | 2            | BRAZIL          |
> ...
> | 24           | UNITED STATES   |
> +--------------+-----------------+
> 25 rows selected (0.614 seconds)
> {code}
> // Q2:  return 0 row. The correct answer is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  dfs.tmp.nation_view a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+---------+
> | n_nationkey  | n_name  |
> +--------------+---------+
> +--------------+---------+
> No rows selected (0.4 seconds)
> {code}
> Case 2: Querying a table expression with select *
> // Q3: return 25 rows. The correct result is 0 row
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE NOT EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+-----------------+
> | n_nationkey  |     n_name      |
> +--------------+-----------------+
> | 0            | ALGERIA         |
> | 1            | ARGENTINA       |
> ...
> | 24           | UNITED STATES   |
> +--------------+-----------------+
> 25 rows selected (0.451 seconds)
> {code}
> Q4: return 0 row. The correct result is 25 rows.
> {code}
> SELECT n_nationkey, n_name
> FROM  (
>   SELECT * FROM cp.`tpch/nation.parquet`
> ) a
> WHERE EXISTS (SELECT 1
> FROM cp.`tpch/region.parquet` b
> WHERE b.r_regionkey =  a.n_regionkey
> )
> +--------------+---------+
> | n_nationkey  | n_name  |
> +--------------+---------+
> +--------------+---------+
> No rows selected (0.515 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)