You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/02/12 00:00:15 UTC
[jira] [Commented] (DRILL-2170) Wrong result when joining to a subquery that has group-by, order-by and LIMIT

    [ https://issues.apache.org/jira/browse/DRILL-2170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317164#comment-14317164 ] 

Aman Sinha commented on DRILL-2170:
-----------------------------------

[~jnadeau] could you please review ?  

Note that there is a separate issue of the Explain plan not looking 'right' considering that the TopN shown in Fragment 2 (see Explain plan in description) may run in parallel but there is no merge operation after that.   I will create a separate JIRA for it.

> Wrong result when joining to a subquery that has group-by, order-by and LIMIT
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-2170
>                 URL: https://issues.apache.org/jira/browse/DRILL-2170
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.7.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: 0001-DRILL-2170-For-fragment-parallelization-use-max-cost.patch
>
>
> On TPCH SF1:  first query gives right result, second produces wrong result.  The main difference is the order-by in the second query is on the aggregation function, not the grouping key.  
> {code}
> // Correct result 
> : jdbc:drill:zk=local> select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 1 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
> +------------+
> |   EXPR$0   |
> +------------+
> | 100        |
> +------------+
> 1 row selected (2.67 seconds)
> // Wrong result
> 0: jdbc:drill:zk=local> select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
> +------------+
> |   EXPR$0   |
> +------------+
> | 400        |
> +------------+
> 1 row selected (3.163 seconds)
> The plan for the second query shows that there's a missing SingleMergeExchange after the TopN operation.  This is needed because before the TopN we did a HashToRandomExchange and since we are doing a LIMIT, we need to merge the output of TopN into a single stream. 
> {code}
> 0: jdbc:drill:zk=local> explain plan for select count(*) from (select l_orderkey, sum(l_quantity), sum(l_extendedprice) from lineitem group by l_orderkey order by 3 limit 100) sq inner join orders o on sq.l_orderkey = o.o_orderkey;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-02        UnionExchange
> 01-01          StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02            Project($f0=[0])
> 01-03              HashJoin(condition=[=($0, $1)], joinType=[inner])
> 01-05                HashToRandomExchange(dist0=[[$0]])
> 02-01                  Project(l_orderkey=[$0])
> 02-02                    SelectionVectorRemover
> 02-03                      Limit(fetch=[100])
> 02-04                        SelectionVectorRemover
> 02-05                          TopN(limit=[100])
> 02-06                            HashToRandomExchange(dist0=[[$2]])
> 04-01                              HashAgg(group=[{0}], EXPR$1=[SUM($1)], EXPR$2=[SUM($2)])
> 04-02                                HashToRandomExchange(dist0=[[$0]])
> 05-01                                  HashAgg(group=[{0}], EXPR$1=[SUM($1)], EXPR$2=[SUM($2)])
> 05-02                                    Project(l_orderkey=[$1], l_quantity=[$2], l_extendedprice=[$0])
> 05-03                                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/lineitem]], selectionRoot=/Users/asinha/data/tpch-sf1/lineitem, numFiles=1, columns=[`l_orderkey`, `l_quantity`, `l_extendedprice`]]])
> 01-04                HashToRandomExchange(dist0=[[$0]])
> 03-01                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpch-sf1/orders]], selectionRoot=/Users/asinha/data/tpch-sf1/orders, numFiles=1, columns=[`o_orderkey`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)