You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2014/06/02 19:56:01 UTC
[jira] [Created] (DRILL-885) Handle project pushdown for constant
expressions
Aman Sinha created DRILL-885:
--------------------------------
Summary: Handle project pushdown for constant expressions
Key: DRILL-885
URL: https://issues.apache.org/jira/browse/DRILL-885
Project: Apache Drill
Issue Type: Bug
Reporter: Aman Sinha
In the following query, notice in the Explain plan that the node Project($f0=[0]) is projecting a constant, so ideally we should not have to produce a whole bunch of columns from either side of the join unless those columns are needed for the join condition. However, currently we do produce those unnecessary columns from the Scan below (see the Customer parquet scan on the left side of the HashJoin). This hurts performance.
0: jdbc:drill:zk=local> explain plan for select count(*) from (select c.c_custkey, c.c_name, c.c_address, c.c_nationkey, c.c_phone, c.c_acctbal, c.c_mktsegment, c.c_comment, n.n_nationkey, n.n_name, n.n_nationkey, n.n_comment from cp.`tpch/customer.parquet` c JOIN cp.`tpch/nation.parquet` n ON (c.c_nationkey = n.n_nationkey));
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 StreamAgg(group=[{}], EXPR$0=[SUM($0)])
00-02 UnionExchange
01-01 StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02 Project($f0=[0])
01-03 HashJoin(condition=[=($1, $10)], joinType=[inner])
01-05 HashToRandomExchange(dist0=[[$1]])
02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/customer.parquet]], selectionRoot=/tpch/customer.parquet, columns=[SchemaPath [`c_nationkey`], SchemaPath [`c_custkey`], SchemaPath [`c_name`], SchemaPath [`c_address`], SchemaPath [`c_phone`], SchemaPath [`c_acctbal`], SchemaPath [`c_mktsegment`], SchemaPath [`c_comment`]]]])
01-04 Project(*0=[$0], n_nationkey=[$1], n_name=[$2], n_comment=[$3])
01-06 HashToRandomExchange(dist0=[[$1]])
03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=/tpch/nation.parquet]], selectionRoot=/tpch/nation.parquet, columns=[SchemaPath [`n_nationkey`], SchemaPath [`n_name`], SchemaPath [`n_comment`]]]])
Here's the Drill Logical plan for the same query:
| DrillScreenRel
DrillAggregateRel(group=[{}], EXPR$0=[COUNT()])
DrillProjectRel($f0=[0])
DrillJoinRel(condition=[=($1, $10)], joinType=[inner])
DrillScanRel(table=[[cp, tpch/customer.parquet]])
DrillScanRel(table=[[cp, tpch/nation.parquet]])
--
This message was sent by Atlassian JIRA
(v6.2#6252)