You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Karthikeyan Manivannan (JIRA)" <ji...@apache.org> on 2018/12/11 23:02:00 UTC

[jira] [Created] (DRILL-6896) Extraneous columns being projected in Drill 1.15

Karthikeyan Manivannan created DRILL-6896:
---------------------------------------------

             Summary: Extraneous columns being projected in Drill 1.15
                 Key: DRILL-6896
                 URL: https://issues.apache.org/jira/browse/DRILL-6896
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.15.0
            Reporter: Karthikeyan Manivannan
            Assignee: Aman Sinha


[~rhou] noted that TPCH13 on Drill 1.15 was running slower than Drill 1.14. Analysis revealed that an extra column was being projected in 1.15 and the slowdown was because the extra column was being unnecessarily pushed across an exchange.

Here is a simplified query written by [~amansinha100] that exhibits the same problem :

In first plan, o_custkey and o_comment are both extraneous projections. 
 In the second plan (on 1.14.0), also, there is an extraneous projection: o_custkey but not o_comment.

On 1.15.0:

-------------

explain plan without implementation for 
 select
 c.c_custkey
 from
 cp.`tpch/customer.parquet` c 
 left outer join cp.`tpch/orders.parquet` o 
 on c.c_custkey = o.o_custkey
 and o.o_comment not like '%special%requests%'
 ;

DrillScreenRel

DrillProjectRel(c_custkey=[$0])

DrillProjectRel(c_custkey=[$2], o_custkey=[$0], o_comment=[$1])

DrillJoinRel(condition=[=($2, $0)], joinType=[right])

DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])

DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])

DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`]]])

On 1.14.0:

-----------------

DrillScreenRel

DrillProjectRel(c_custkey=[$0])

DrillProjectRel(c_custkey=[$1], o_custkey=[$0])

DrillJoinRel(condition=[=($1, $0)], joinType=[right])

DrillProjectRel(o_custkey=[$0])

DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])

DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])

DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`]]])



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)