You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Boaz Ben-Zvi (JIRA)" <ji...@apache.org> on 2019/01/29 04:15:00 UTC
[jira] [Created] (DRILL-7013) Hash-Join and Hash-Aggr to handle incoming with selection vectors

Boaz Ben-Zvi created DRILL-7013:
-----------------------------------

             Summary: Hash-Join and Hash-Aggr to handle incoming with selection vectors
                 Key: DRILL-7013
                 URL: https://issues.apache.org/jira/browse/DRILL-7013
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators, Query Planning &amp; Optimization
    Affects Versions: 1.15.0
            Reporter: Boaz Ben-Zvi


  The Hash-Join and Hash-Aggr operators copy each incoming row separately. When the incoming data has a selection vector (e.g., outgoing from a Filter), a _SelectionVectorRemover_ is added before the Hash operator, as the latter cannot handle the selection vector.  

  Thus every row is needlessly being copied twice!

+Suggestion+: Enhance the Hash operators to handle potential incoming selection vectors, thus eliminating  the need for the extra copy. The planner needs to be changed not to add that SelectionVectorRemover.

For example:
{code:sql}
select * from cp.`tpch/lineitem.parquet` L,  cp.`tpch/orders.parquet` O where O.o_custkey > 1498 and L.l_orderkey > 58999 and O.o_orderkey = L.l_orderkey 
{code}
And the plan:
{panel}
00-00 Screen : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
 00-01 ProjectAllowDup(**=[$0], **0=[$1]) : rowType = RecordType(DYNAMIC_STAR **, DYNAMIC_STAR **0): 
 00-02 Project(T44¦¦**=[$0], T45¦¦**=[$2]) : rowType = RecordType(DYNAMIC_STAR T44¦¦**, DYNAMIC_STAR T45¦¦**): 
 00-03 HashJoin(condition=[=($1, $4)], joinType=[inner], semi-join: =[false]) : rowType = RecordType(DYNAMIC_STAR T44¦¦**, ANY l_orderkey, DYNAMIC_STAR T45¦¦**, ANY o_custkey, ANY o_orderkey): 
 00-05 *SelectionVectorRemover* : rowType = RecordType(DYNAMIC_STAR T44¦¦**, ANY l_orderkey):
 00-07 Filter(condition=[>($1, 58999)]) : rowType = RecordType(DYNAMIC_STAR T44¦¦**, ANY l_orderkey):
 00-09 Project(T44¦¦**=[$0], l_orderkey=[$1]) : rowType = RecordType(DYNAMIC_STAR T44¦¦**, ANY l_orderkey): 
 00-11 Scan(table=[[cp, tpch/lineitem.parquet]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/lineitem.parquet]], 
 00-04 *SelectionVectorRemover* : rowType = RecordType(DYNAMIC_STAR T45¦¦**, ANY o_custkey, ANY o_orderkey):
 00-06 Filter(condition=[AND(>($1, 1498), >($2, 58999))]) : rowType = RecordType(DYNAMIC_STAR T45¦¦**, ANY o_custkey, ANY o_orderkey): 
 00-08 Project(T45¦¦**=[$0], o_custkey=[$1], o_orderkey=[$2]) : rowType = RecordType(DYNAMIC_STAR T45¦¦**, ANY o_custkey, ANY o_orderkey):
 00-10 Scan(table=[[cp, tpch/orders.parquet]],
{panel}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)