You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/02/05 18:51:34 UTC

[jira] [Created] (DRILL-2170) Wrong result when joining to a subquery that has group-by, order-by and LIMIT

Aman Sinha created DRILL-2170:
---------------------------------

             Summary: Wrong result when joining to a subquery that has group-by, order-by and LIMIT
                 Key: DRILL-2170
                 URL: https://issues.apache.org/jira/browse/DRILL-2170
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.7.0
            Reporter: Aman Sinha
            Assignee: Aman Sinha
            Priority: Critical


On TPCH SF1:  first query gives right result, second produces wrong result.  The main difference is the order-by in the second query is on the aggregation function, not the grouping key.  

{code}
// Correct result 

: jdbc:drill:zk=local> select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 1 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
+------------+
|   EXPR$0   |
+------------+
| 100        |
+------------+
1 row selected (2.67 seconds)

// Wrong result

0: jdbc:drill:zk=local> select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
+------------+
|   EXPR$0   |
+------------+
| 400        |
+------------+
1 row selected (3.163 seconds)

Both queries should be having identical plans, but the plan for the second query shows a missing SingleMergeExchange after the TopN operation. 

{code}
0: jdbc:drill:zk=local> explain plan for select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-02        UnionExchange
01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02            Project($f0=[0])
01-03              HashJoin(condition=[=($0, $1)], joinType=[inner])
01-05                HashToRandomExchange(dist0=[[$0]])
02-01                  Project(l_orderkey=[$0])
02-02                    SelectionVectorRemover
02-03                      Limit(fetch=[100])
02-04                        SelectionVectorRemover
02-05                          TopN(limit=[100])
02-06                            HashToRandomExchange(dist0=[[$2]])
04-01                              HashAgg(group=[{0}], EXPR$1=[SUM($1)], EXPR$2=[SUM($2)])
04-02                                HashToRandomExchange(dist0=[[$0]])
05-01                                  HashAgg(group=[{0}], EXPR$1=[SUM($1)], EXPR$2=[SUM($2)])
05-02                                    Project(l_orderkey=[$1], l_quantity=[$2], l_extendedprice=[$0])
05-03                                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/lineitem]], selectionRoot=/Users/asinha/data/tpch-sf1/lineitem, numFiles=1, columns=[`l_orderkey`, `l_quantity`, `l_extendedprice`]]])
01-04                HashToRandomExchange(dist0=[[$0]])
03-01                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/orders]], selectionRoot=/Users/asinha/data/tpch-sf1/orders, numFiles=1, columns=[`o_orderkey`]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)