You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Bridget Bevens (JIRA)" <ji...@apache.org> on 2018/08/01 20:22:00 UTC
[jira] [Commented] (DRILL-6574) Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)

    [ https://issues.apache.org/jira/browse/DRILL-6574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16565924#comment-16565924 ] 

Bridget Bevens commented on DRILL-6574:
---------------------------------------

Hi [~KazydubB] and [~vvysotskyi],

I’ve create a [Google doc for LIMIT 0|https://docs.google.com/document/d/1CUPwDI1OY4N7JBTrngb49bSCi1hqDyYE0iKgtXmZ1rQ/edit?usp=sharing] because there is no documentation for it in the Apache Drill doc set. 
I’ve include the optimization options in the doc. 
Can you please review the document and provide feedback? 
I plan to add the info to the LIMIT clause page: https://drill.apache.org/docs/limit-clause/ 

Thanks,
Bridget



> Add option to push LIMIT(0) on top of SCAN (late limit 0 optimization)
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6574
>                 URL: https://issues.apache.org/jira/browse/DRILL-6574
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.13.0
>            Reporter: Bohdan Kazydub
>            Assignee: Bohdan Kazydub
>            Priority: Major
>              Labels: doc-impacting, ready-to-commit
>             Fix For: 1.14.0
>
>
> Currently we have early limit 0 optimization (planner.enable_limit0_optimization) which determines query data types before actual scan. Since we not always able to determine data type during planning, we need to add one more option to enable late limit 0 optimization (planner.enable_limit0_on_scan, exit query right after scan. LIMIT(0) on SCAN for UNION and complex functions will be disabled i.e. UNION and complex functions need data to produce result schema. This would not work for the following list of functions: KVGEN, MAPPIFY, FLATTEN, CONVERT_FROMJSON, CONVERT_TOJSON, CONVERT_TOSIMPLEJSON, CONVERT_TOEXTENDEDJSON.
> Query plan examples:
> For query
> {code:java}
> SELECT * FROM (
>   SELECT l.l_quantity, l.l_shipdate, o.o_custkey 
>   FROM cp.`tpch/lineitem.parquet` l 
>   JOIN cp.`tpch/orders.parquet` o ON l.l_orderkey = o.o_orderkey 
>   LIMIT 2) 
> LIMIT 0
> {code}
> {color:#6a8759}plan after changes looks like{color}
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75183.1 rows, 210559.1 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 527
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75183.0 rows, 210559.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 526
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75182.0 rows, 210556.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 525
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 524
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, cumulative cost = \{75181.0 rows, 210555.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 523
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{75179.0 rows, 210547.0 cpu, 0.0 io, 0.0 network, 17.6 memory}, id = 522
> 00-07                SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost = \{60176.0 rows, 180526.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 518
> 00-09                  Limit(offset=[0], fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 1.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 517
> 00-11                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 516
> 00-06                SelectionVectorRemover : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15001.0 rows, 30001.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 521
> 00-08                  Limit(offset=[0], fetch=[0]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 520
> 00-10                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 519
> {noformat}
> and before changes:
> {noformat}
> 00-00    Screen : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150354.1 rows, 1052637.1 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 452
> 00-01      Project(l_quantity=[$1], l_shipdate=[$2], o_custkey=[$4]) : rowType = RecordType(ANY l_quantity, ANY l_shipdate, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150354.0 rows, 1052637.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 451
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150353.0 rows, 1052634.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 450
> 00-03          Limit(fetch=[0]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 1.0, cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 449
> 00-04            Limit(fetch=[2]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 2.0, cumulative cost = \{150352.0 rows, 1052633.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 448
> 00-05              HashJoin(condition=[=($0, $3)], joinType=[inner]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate, ANY o_orderkey, ANY o_custkey): rowcount = 60175.0, cumulative cost = \{150350.0 rows, 1052625.0 cpu, 0.0 io, 0.0 network, 264000.0 memory}, id = 447
> 00-07                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], selectionRoot=classpath:/tpch/lineitem.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`l_orderkey`, `l_quantity`, `l_shipdate`]]]) : rowType = RecordType(ANY l_orderkey, ANY l_quantity, ANY l_shipdate): rowcount = 60175.0, cumulative cost = \{60175.0 rows, 180525.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 445
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_orderkey`, `o_custkey`]]]) : rowType = RecordType(ANY o_orderkey, ANY o_custkey): rowcount = 15000.0, cumulative cost = \{15000.0 rows, 30000.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
> {noformat}
> Also both early and late limit 0 optimizations will be enabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)