You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2014/06/26 02:51:25 UTC
[jira] [Commented] (DRILL-989) TPCH 20 returning wrong results
[ https://issues.apache.org/jira/browse/DRILL-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044226#comment-14044226 ]
Aman Sinha commented on DRILL-989:
----------------------------------
Based on the Explain plan, the wrong result is likely caused by incorrect distribution on two sides of a hash join. Note that the left side of the HashJoin 02-08 below is distributing on 2 columns ($0 and $2). Right side of the HashJoin is the Project 02-09 which is distributed on 1 column only based on the HashToRandomExchange below the HashAggregate. We should either be distributing on both columns on both sides of the join or pick one join key (based on certain criteria) and distribute both sides on the corresponding join key.
> 02-08 HashJoin(condition=[AND(=($0, $3), =($2, $4))], joinType=[left])
> 02-10 HashToRandomExchange(dist0=[[$0]], dist1=[[$2]])
> 04-01 Project($f1=[$0], $f2=[$1], $f3=[$2])
> 04-02 HashJoin(condition=[=($3, $4)], joinType=[inner])
> 04-04 Project($f1=[$0], $f2=[$2], $f3=[$1], $f4=[$0])
> 04-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath [`ps_partkey`], SchemaPath [`ps_availqty`], SchemaPath [`ps_suppkey`]]]])
> 04-03 BroadcastExchange
> 07-01 HashAgg(group=[{0}])
> 07-02 HashToRandomExchange(dist0=[[$0]])
> 09-01 Project(p_partkey=[$1])
> 09-02 SelectionVectorRemover
> 09-03 Filter(condition=[LIKE($0, 'antique%')])
> 09-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/part]], selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath [`p_name`], SchemaPath [`p_partkey`]]]])
> 02-09 Project($f0=[$0], $f10=[$1], $f20=[$2])
> 02-11 HashAgg(group=[{0, 1}], agg#0=[SUM($2)])
> 02-12 HashToRandomExchange(dist0=[[$0]])
> TPCH 20 returning wrong results
> -------------------------------
>
> Key: DRILL-989
> URL: https://issues.apache.org/jira/browse/DRILL-989
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Flow
> Reporter: Ramana Inukonda Nagaraj
>
> TPCH 20 returns 0 rows, successfully executes though.
> Text version of physical plan:
> {code}
> 00-00 Screen
> 00-01 Project(s_name=[$0], s_address=[$1])
> 00-02 SingleMergeExchange(sort0=[0 ASC])
> 01-01 SelectionVectorRemover
> 01-02 Sort(sort0=[$0], dir0=[ASC])
> 01-03 HashToRandomExchange(dist0=[[$0]])
> 02-01 Project(s_name=[$0], s_address=[$1])
> 02-02 HashJoin(condition=[=($2, $3)], joinType=[inner])
> 02-04 HashToRandomExchange(dist0=[[$2]])
> 03-01 Project($f3=[$2], $f4=[$3], $f8=[$0])
> 03-02 HashJoin(condition=[=($1, $5)], joinType=[inner])
> 03-04 Project(s_suppkey=[$3], s_nationkey=[$2], s_name=[$1], s_address=[$0])
> 03-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/supplier]], selectionRoot=/drill/testdata/tpch-multi/supplier, columns=[SchemaPath [`s_suppkey`], SchemaPath [`s_nationkey`], SchemaPath [`s_name`], SchemaPath [`s_address`]]]])
> 03-03 BroadcastExchange
> 06-01 SelectionVectorRemover
> 06-02 Filter(condition=[=(CAST($0):CHAR(5) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'KENYA')])
> 06-03 Project(n_name=[$1], n_nationkey=[$0])
> 06-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/nation]], selectionRoot=/drill/testdata/tpch-multi/nation, columns=[SchemaPath [`n_name`], SchemaPath [`n_nationkey`]]]])
> 02-03 StreamAgg(group=[{0}])
> 02-05 Project(ps_suppkey=[$2])
> 02-06 SelectionVectorRemover
> 02-07 Filter(condition=[AND(true, >($1, CAST(*(0.5, $5)):ANY))])
> 02-08 HashJoin(condition=[AND(=($0, $3), =($2, $4))], joinType=[left])
> 02-10 HashToRandomExchange(dist0=[[$0]], dist1=[[$2]])
> 04-01 Project($f1=[$0], $f2=[$1], $f3=[$2])
> 04-02 HashJoin(condition=[=($3, $4)], joinType=[inner])
> 04-04 Project($f1=[$0], $f2=[$2], $f3=[$1], $f4=[$0])
> 04-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath [`ps_partkey`], SchemaPath [`ps_availqty`], SchemaPath [`ps_suppkey`]]]])
> 04-03 BroadcastExchange
> 07-01 HashAgg(group=[{0}])
> 07-02 HashToRandomExchange(dist0=[[$0]])
> 09-01 Project(p_partkey=[$1])
> 09-02 SelectionVectorRemover
> 09-03 Filter(condition=[LIKE($0, 'antique%')])
> 09-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/part]], selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath [`p_name`], SchemaPath [`p_partkey`]]]])
> 02-09 Project($f0=[$0], $f10=[$1], $f20=[$2])
> 02-11 HashAgg(group=[{0, 1}], agg#0=[SUM($2)])
> 02-12 HashToRandomExchange(dist0=[[$0]])
> 05-01 Project($f0=[$4], $f1=[$5], l_quantity=[$3])
> 05-02 HashJoin(condition=[AND(=($0, $4), =($1, $5))], joinType=[inner])
> 05-04 SelectionVectorRemover
> 05-05 Filter(condition=[AND(>=($2, 1993-01-01), <($2, +(1993-01-01, 12)))])
> 05-06 Project(l_partkey=[$2], l_suppkey=[$1], l_shipdate=[$3], l_quantity=[$0])
> 05-07 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/lineitem]], selectionRoot=/drill/testdata/tpch-multi/lineitem, columns=[SchemaPath [`l_partkey`], SchemaPath [`l_suppkey`], SchemaPath [`l_shipdate`], SchemaPath [`l_quantity`]]]])
> 05-03 BroadcastExchange
> 08-01 HashAgg(group=[{0, 1}])
> 08-02 HashToRandomExchange(dist0=[[$0]])
> 10-01 Project($f0=[$0], $f1=[$1])
> 10-02 HashJoin(condition=[=($2, $3)], joinType=[inner])
> 10-04 Project($f1=[$0], $f3=[$1], $f4=[$0])
> 10-05 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath [`ps_partkey`], SchemaPath [`ps_suppkey`]]]])
> 10-03 BroadcastExchange
> 11-01 HashAgg(group=[{0}])
> 11-02 HashToRandomExchange(dist0=[[$0]])
> 12-01 Project(p_partkey=[$1])
> 12-02 SelectionVectorRemover
> 12-03 Filter(condition=[LIKE($0, 'antique%')])
> 12-04 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/part]], selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath [`p_name`], SchemaPath [`p_partkey`]]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)