You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2014/06/26 02:51:25 UTC

[jira] [Commented] (DRILL-989) TPCH 20 returning wrong results

    [ https://issues.apache.org/jira/browse/DRILL-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14044226#comment-14044226 ] 

Aman Sinha commented on DRILL-989:
----------------------------------

Based on the Explain plan,  the wrong result is likely caused by incorrect distribution on two sides of a hash join.  Note that the left side of the HashJoin 02-08 below is distributing on 2 columns ($0 and $2).  Right side of the HashJoin is the Project 02-09 which is distributed on 1 column only based on the HashToRandomExchange below the HashAggregate.   We should either be distributing on both columns on both sides of the join or pick one join key (based on certain criteria) and distribute both sides on the corresponding join key. 

> 02-08                            HashJoin(condition=[AND(=($0, $3), =($2, $4))], joinType=[left])
> 02-10                              HashToRandomExchange(dist0=[[$0]], dist1=[[$2]])
> 04-01                                Project($f1=[$0], $f2=[$1], $f3=[$2])
> 04-02                                  HashJoin(condition=[=($3, $4)], joinType=[inner])
> 04-04                                    Project($f1=[$0], $f2=[$2], $f3=[$1], $f4=[$0])
> 04-05                                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath [`ps_partkey`], SchemaPath [`ps_availqty`], SchemaPath [`ps_suppkey`]]]])
> 04-03                                    BroadcastExchange
> 07-01                                      HashAgg(group=[{0}])
> 07-02                                        HashToRandomExchange(dist0=[[$0]])
> 09-01                                          Project(p_partkey=[$1])
> 09-02                                            SelectionVectorRemover
> 09-03                                              Filter(condition=[LIKE($0, 'antique%')])
> 09-04                                                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/part]], selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath [`p_name`], SchemaPath [`p_partkey`]]]])
> 02-09                              Project($f0=[$0], $f10=[$1], $f20=[$2])
> 02-11                                HashAgg(group=[{0, 1}], agg#0=[SUM($2)])
> 02-12                                  HashToRandomExchange(dist0=[[$0]])

> TPCH 20 returning wrong results
> -------------------------------
>
>                 Key: DRILL-989
>                 URL: https://issues.apache.org/jira/browse/DRILL-989
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Flow
>            Reporter: Ramana Inukonda Nagaraj
>
> TPCH 20 returns 0 rows, successfully executes though.
> Text version of physical plan:
> {code}
> 00-00    Screen
> 00-01      Project(s_name=[$0], s_address=[$1])
> 00-02        SingleMergeExchange(sort0=[0 ASC])
> 01-01          SelectionVectorRemover
> 01-02            Sort(sort0=[$0], dir0=[ASC])
> 01-03              HashToRandomExchange(dist0=[[$0]])
> 02-01                Project(s_name=[$0], s_address=[$1])
> 02-02                  HashJoin(condition=[=($2, $3)], joinType=[inner])
> 02-04                    HashToRandomExchange(dist0=[[$2]])
> 03-01                      Project($f3=[$2], $f4=[$3], $f8=[$0])
> 03-02                        HashJoin(condition=[=($1, $5)], joinType=[inner])
> 03-04                          Project(s_suppkey=[$3], s_nationkey=[$2], s_name=[$1], s_address=[$0])
> 03-05                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/supplier]], selectionRoot=/drill/testdata/tpch-multi/supplier, columns=[SchemaPath [`s_suppkey`], SchemaPath [`s_nationkey`], SchemaPath [`s_name`], SchemaPath [`s_address`]]]])
> 03-03                          BroadcastExchange
> 06-01                            SelectionVectorRemover
> 06-02                              Filter(condition=[=(CAST($0):CHAR(5) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary", 'KENYA')])
> 06-03                                Project(n_name=[$1], n_nationkey=[$0])
> 06-04                                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/nation]], selectionRoot=/drill/testdata/tpch-multi/nation, columns=[SchemaPath [`n_name`], SchemaPath [`n_nationkey`]]]])
> 02-03                    StreamAgg(group=[{0}])
> 02-05                      Project(ps_suppkey=[$2])
> 02-06                        SelectionVectorRemover
> 02-07                          Filter(condition=[AND(true, >($1, CAST(*(0.5, $5)):ANY))])
> 02-08                            HashJoin(condition=[AND(=($0, $3), =($2, $4))], joinType=[left])
> 02-10                              HashToRandomExchange(dist0=[[$0]], dist1=[[$2]])
> 04-01                                Project($f1=[$0], $f2=[$1], $f3=[$2])
> 04-02                                  HashJoin(condition=[=($3, $4)], joinType=[inner])
> 04-04                                    Project($f1=[$0], $f2=[$2], $f3=[$1], $f4=[$0])
> 04-05                                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath [`ps_partkey`], SchemaPath [`ps_availqty`], SchemaPath [`ps_suppkey`]]]])
> 04-03                                    BroadcastExchange
> 07-01                                      HashAgg(group=[{0}])
> 07-02                                        HashToRandomExchange(dist0=[[$0]])
> 09-01                                          Project(p_partkey=[$1])
> 09-02                                            SelectionVectorRemover
> 09-03                                              Filter(condition=[LIKE($0, 'antique%')])
> 09-04                                                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/part]], selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath [`p_name`], SchemaPath [`p_partkey`]]]])
> 02-09                              Project($f0=[$0], $f10=[$1], $f20=[$2])
> 02-11                                HashAgg(group=[{0, 1}], agg#0=[SUM($2)])
> 02-12                                  HashToRandomExchange(dist0=[[$0]])
> 05-01                                    Project($f0=[$4], $f1=[$5], l_quantity=[$3])
> 05-02                                      HashJoin(condition=[AND(=($0, $4), =($1, $5))], joinType=[inner])
> 05-04                                        SelectionVectorRemover
> 05-05                                          Filter(condition=[AND(>=($2, 1993-01-01), <($2, +(1993-01-01, 12)))])
> 05-06                                            Project(l_partkey=[$2], l_suppkey=[$1], l_shipdate=[$3], l_quantity=[$0])
> 05-07                                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/lineitem]], selectionRoot=/drill/testdata/tpch-multi/lineitem, columns=[SchemaPath [`l_partkey`], SchemaPath [`l_suppkey`], SchemaPath [`l_shipdate`], SchemaPath [`l_quantity`]]]])
> 05-03                                        BroadcastExchange
> 08-01                                          HashAgg(group=[{0, 1}])
> 08-02                                            HashToRandomExchange(dist0=[[$0]])
> 10-01                                              Project($f0=[$0], $f1=[$1])
> 10-02                                                HashJoin(condition=[=($2, $3)], joinType=[inner])
> 10-04                                                  Project($f1=[$0], $f3=[$1], $f4=[$0])
> 10-05                                                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/partsupp]], selectionRoot=/drill/testdata/tpch-multi/partsupp, columns=[SchemaPath [`ps_partkey`], SchemaPath [`ps_suppkey`]]]])
> 10-03                                                  BroadcastExchange
> 11-01                                                    HashAgg(group=[{0}])
> 11-02                                                      HashToRandomExchange(dist0=[[$0]])
> 12-01                                                        Project(p_partkey=[$1])
> 12-02                                                          SelectionVectorRemover
> 12-03                                                            Filter(condition=[LIKE($0, 'antique%')])
> 12-04                                                              Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/tpch-multi/part]], selectionRoot=/drill/testdata/tpch-multi/part, columns=[SchemaPath [`p_name`], SchemaPath [`p_partkey`]]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)