You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/10/05 17:39:28 UTC
[jira] [Updated] (DRILL-1457) Limit operator optimization : push limit operator past exchange operator

     [ https://issues.apache.org/jira/browse/DRILL-1457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victoria Markman updated DRILL-1457:
------------------------------------
    Summary: Limit operator optimization : push limit operator past exchange operator  (was: Limit operator optimization : push limit operator past exchange operator; disable parallel plan if no order is required.)

> Limit operator optimization : push limit operator past exchange operator
> ------------------------------------------------------------------------
>
>                 Key: DRILL-1457
>                 URL: https://issues.apache.org/jira/browse/DRILL-1457
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>            Priority: Critical
>              Labels: no_verified_test
>             Fix For: 1.2.0
>
>         Attachments: 0001-DRILL-1457-Push-Limit-past-through-UnionExchange.patch
>
>
> When there is LIMIT clause in a query, we would want to push down the LIMIT operator as much as possible, so that the upstream operator will stop execution once the desired number of rows are fetched.
> Within one execution fragment, Drill applies a pull model. In many cases, there would be no performance impact if LIMIT operator is not pushed down, since LIMIT would inform the upstream operators to stop. However, in multiple fragments, Drill use a push model.  if LIMIT is not pushed past the exchange operator, and the upstream fragment would continue the execution, until it receives a notice from downstream fragment, even if LIMIT operator has already got the required # of rows.
> For instance:
> explain plan for select * from dfs.`/Users/jni/work/tpch-data/tpch-sf10/lineitem` limit 1;
> +------------+------------+
> | 00-00    Screen
> 00-01      SelectionVectorRemover
> 00-02        Limit(fetch=[1])
> 00-03          UnionExchange
> 01-01            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/jni/work/tpch-data/tpch-sf10/lineitem]], selectionRoot=/Users/jni/work/tpch-data/tpch-sf10/lineitem, columns=[SchemaPath [`*`]]]])
> The query profile shows Scan operator fetches much more records than desired:
> Minor Fragment	Start	End	Total Time	Max Records	Max Batches
> 01-00-xx	0.507	1.059	0.552	43688	8
> 01-01-xx	0.570	1.054	0.484	27305	5
> 01-02-xx	0.617	1.038	0.421	16383	3
> 01-03-xx	0.668	1.056	0.388	10922	2
> 01-04-xx	0.740	1.055	0.315	10922	2
> 01-05-xx	0.813	1.057	0.244	5461	1
> In the above plan,  there would be two choices for performance optimization:
> 1) push the LIMIT operator past through EXCHANGE operator, ideally into SCAN operator. 
> 2) Disable the parallel plan by removing EXCHANGE operator.
>  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)