You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2018/12/12 01:11:00 UTC

[jira] [Commented] (DRILL-6896) Extraneous columns being projected past a join

    [ https://issues.apache.org/jira/browse/DRILL-6896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16718317#comment-16718317 ] 

Aman Sinha commented on DRILL-6896:
-----------------------------------

On further analysis,  I can see a similar (not exactly same) extraneous column projection even on 1.14.0 and with an Inner Join query, with the filter condition in the WHERE clause (instead of ON clause).  

On 1.14.0: 
{noformat}
 explain plan without implementation for 
 select
   c.c_custkey
    from
       cp.`tpch/customer.parquet` c 
       inner join cp.`tpch/orders.parquet` o 
        on c.c_custkey = o.o_custkey
         where o.o_orderkey < 10 and c.c_name like 'abc%'
 ;
+------+------+
| text | json |
+------+------+
DrillScreenRel
  DrillProjectRel(c_custkey=[$0])
    DrillProjectRel(c_custkey=[$2], c_name=[$3], o_custkey=[$0], o_orderkey=[$1])
      DrillJoinRel(condition=[=($2, $0)], joinType=[inner])
        DrillFilterRel(condition=[<($1, 10)])
          DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_orderkey`]]])
        DrillFilterRel(condition=[LIKE($1, 'abc%')])
          DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`, `c_name`]]])
{noformat}

Notice above that the DrillProjectRel above the Join is projecting both c_name and o_orderkey even though both these columns are only needed by the filter conditions below the DrillJoinRel.  There are 2 possible reasons why they were not pushed down: 
1.  Costing.  For whatever reason, the projection pushdown did not change the cost, so Calcite preserved the original non-pushdown plan.  This is less likely compared to #2 below. 
2.  Potential issues with rule application.  There were some issues with Calcite's ProjectMergeRule (see https://issues.apache.org/jira/browse/CALCITE-2223) which required a workaround implemented in [DRILL-6212|https://issues.apache.org/jira/browse/DRILL-6212].  A second JIRA: [DRILL-6501|https://issues.apache.org/jira/browse/DRILL-6501] was filed to revert to using ProjectMergeRule later. 

Considering that 1.14.0 also shows a similar issue (albeit not for the same query),  I am inclined to move this JIRA to 1.16.0 release.  

> Extraneous columns being projected past a join
> ----------------------------------------------
>
>                 Key: DRILL-6896
>                 URL: https://issues.apache.org/jira/browse/DRILL-6896
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.15.0
>            Reporter: Karthikeyan Manivannan
>            Assignee: Aman Sinha
>            Priority: Major
>
> [~rhou] noted that TPCH13 on Drill 1.15 was running slower than Drill 1.14. Analysis revealed that an extra column was being projected in 1.15 and the slowdown was because the extra column was being unnecessarily pushed across an exchange.
> Here is a simplified query written by [~amansinha100] that exhibits the same problem :
> In first plan, o_custkey and o_comment are both extraneous projections. 
>  In the second plan (on 1.14.0), also, there is an extraneous projection: o_custkey but not o_comment.
> On 1.15.0:
> -------------
> {noformat}
> explain plan without implementation for 
>     select
>       c.c_custkey
>     from
>        cp.`tpch/customer.parquet` c 
>          left outer join cp.`tpch/orders.parquet` o 
>       on c.c_custkey = o.o_custkey
>      and o.o_comment not like '%special%requests%'
>    ;
> DrillScreenRel
>   DrillProjectRel(c_custkey=[$0])
>     DrillProjectRel(c_custkey=[$2], o_custkey=[$0], o_comment=[$1])
>       DrillJoinRel(condition=[=($2, $0)], joinType=[right])
>         DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
>           DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
>         DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`]]])
> {noformat}
> On 1.14.0:
> -------------
> {noformat}
> DrillScreenRel
>   DrillProjectRel(c_custkey=[$0])
>     DrillProjectRel(c_custkey=[$1], o_custkey=[$0])
>       DrillJoinRel(condition=[=($1, $0)], joinType=[right])
>         DrillProjectRel(o_custkey=[$0])
>           DrillFilterRel(condition=[NOT(LIKE($1, '%special%requests%'))])
>             DrillScanRel(table=[[cp, tpch/orders.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/orders.parquet]], selectionRoot=classpath:/tpch/orders.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`o_custkey`, `o_comment`]]])
>         DrillScanRel(table=[[cp, tpch/customer.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/customer.parquet]], selectionRoot=classpath:/tpch/customer.parquet, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`c_custkey`]]])
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)