You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/07/03 13:54:00 UTC

[jira] [Commented] (DRILL-6553) Fix TopN for unnest operator

    [ https://issues.apache.org/jira/browse/DRILL-6553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531440#comment-16531440 ] 

ASF GitHub Bot commented on DRILL-6553:
---------------------------------------

priteshm commented on issue #1353: DRILL-6553: Fix TopN for unnest operator
URL: https://github.com/apache/drill/pull/1353#issuecomment-402165472
 
 
   @HanumathRao can you review this?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Fix TopN for unnest operator
> ----------------------------
>
>                 Key: DRILL-6553
>                 URL: https://issues.apache.org/jira/browse/DRILL-6553
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Volodymyr Vysotskyi
>            Assignee: Volodymyr Vysotskyi
>            Priority: Major
>             Fix For: 1.14.0
>
>
> Plan for the query with unnest is chosen non-optimally:
> {code:sql}
> select customer.c_custkey, customer.c_name, t.o.o_orderkey,t.o.o_totalprice
> from dfs.`lateraljoin/multipleFiles` customer,
> unnest(customer.c_orders) t(o)
> order by customer.c_custkey, t.o.o_orderkey, t.o.o_totalprice
> limit 50
> {code}
> Plan:
> {noformat}
> 00-00    Screen
> 00-01      ProjectAllowDup(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[50])
> 00-04            SelectionVectorRemover
> 00-05              Sort(sort0=[$0], sort1=[$2], sort2=[$3], dir0=[ASC], dir1=[ASC], dir2=[ASC])
> 00-06                Project(c_custkey=[$2], c_name=[$3], EXPR$2=[ITEM($4, 'o_orderkey')], EXPR$3=[ITEM($4, 'o_totalprice')])
> 00-07                  LateralJoin(correlation=[$cor0], joinType=[inner], requiredColumns=[{1}])
> 00-09                    Project(T0¦¦**=[$0], c_orders=[$1], c_custkey=[$2], c_name=[$3])
> 00-11                      Scan(groupscan=[EasyGroupScan [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles, numFiles=2, columns=[`**`], files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json, file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
> 00-08                    Project(c_orders0=[$0])
> 00-10                      Unnest [srcOp=00-07] 
> {noformat}
> A similar query, but with flatten:
> {code:sql}
> select f.c_custkey, f.c_name, f.o.o_orderkey, f.o.o_totalprice from (select c_custkey, c_name, flatten(c_orders) as o from dfs.`lateraljoin/multipleFiles` customer) f order by f.c_custkey, f.o.o_orderkey, f.o.o_totalprice limit 50
> {code}
> has plan:
> {noformat}
> 00-00    Screen
> 00-01      Project(c_custkey=[$0], c_name=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02        SelectionVectorRemover
> 00-03          Limit(fetch=[50])
> 00-04            SelectionVectorRemover
> 00-05              TopN(limit=[50])
> 00-06                Project(c_custkey=[$0], c_name=[$1], EXPR$2=[ITEM($2, 'o_orderkey')], EXPR$3=[ITEM($2, 'o_totalprice')])
> 00-07                  Flatten(flattenField=[$2])
> 00-08                    Project(c_custkey=[$0], c_name=[$1], o=[$2])
> 00-09                      Scan(groupscan=[EasyGroupScan [selectionRoot=file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles, numFiles=2, columns=[`c_custkey`, `c_name`, `c_orders`], files=[file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_2.json, file:/home/mapr/drill/exec/java-exec/target/org.apache.drill.exec.physical.impl.lateraljoin.TestE2EUnnestAndLateral/root/lateraljoin/multipleFiles/cust_order_10_1.json]]])
> {noformat}
> The main difference is that for the case of unnest, a project wasn't pushed to the scan and Limit with Sort weren't converted to TopN. 
> The first problem is tracked by DRILL-6545 and this Jira aims to fix the problem with TopN



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)