You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/03/18 01:18:38 UTC
[jira] [Assigned] (DRILL-2488) Wrong result on join between two
subqueries with aggregation
[ https://issues.apache.org/jira/browse/DRILL-2488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha reassigned DRILL-2488:
---------------------------------
Assignee: Aman Sinha (was: Chris Westin)
> Wrong result on join between two subqueries with aggregation
> ------------------------------------------------------------
>
> Key: DRILL-2488
> URL: https://issues.apache.org/jira/browse/DRILL-2488
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 0.8.0
> Reporter: Victoria Markman
> Assignee: Aman Sinha
> Priority: Critical
> Fix For: 0.9.0
>
> Attachments: t1.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> | a1 | b1 | c1 |
> +------------+------------+------------+
> | 1 | aaaaa | 2015-01-01 |
> | 2 | bbbbb | 2015-01-02 |
> | 3 | ccccc | 2015-01-03 |
> | 4 | null | 2015-01-04 |
> | 5 | eeeee | 2015-01-05 |
> | 6 | fffff | 2015-01-06 |
> | 7 | ggggg | 2015-01-07 |
> | null | hhhhh | 2015-01-08 |
> | 9 | iiiii | null |
> | 10 | jjjjj | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.15 seconds)
> {code}
> This result is incorrect, one row is missing
> {code}
> 0: jdbc:drill:schema=dfs> select * from
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > b1,
> . . . . . . . . . . . . > count(distinct a1)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . > b1
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > b1 limit 5 offset 1
> . . . . . . . . . . . . > ) as sq1(x1, y1)
> . . . . . . . . . . . . >
> . . . . . . . . . . . . > inner join
> . . . . . . . . . . . . >
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > b1,
> . . . . . . . . . . . . > count(distinct a1)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . > b1
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > b1 limit 5 offset 1
> . . . . . . . . . . . . > ) as sq2(x1, y1)
> . . . . . . . . . . . . > on
> . . . . . . . . . . . . > sq1.x1 = sq2.x1 and
> . . . . . . . . . . . . > sq2.y1 = sq2.y1
> . . . . . . . . . . . . > ;
> +------------+------------+------------+------------+
> | x1 | y1 | x10 | y10 |
> +------------+------------+------------+------------+
> | bbbbb | 1 | bbbbb | 1 |
> | ccccc | 1 | ccccc | 1 |
> | eeeee | 1 | eeeee | 1 |
> | fffff | 1 | fffff | 1 |
> +------------+------------+------------+------------+
> 4 rows selected (0.28 seconds)
> {code}
> Explain plan for the wrong result:
> {code}
> 00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
> 00-02 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
> 00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner])
> 00-05 Limit(offset=[1], fetch=[5])
> 00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
> 00-09 Sort(sort0=[$0], dir0=[ASC])
> 00-11 StreamAgg(group=[{0, 1}])
> 00-13 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]])
> 00-04 Project(b10=[$0], EXPR$10=[$1])
> 00-06 SelectionVectorRemover
> 00-08 Sort(sort0=[$0], dir0=[ASC])
> 00-10 Filter(condition=[=($1, $1)])
> 00-12 Limit(offset=[1], fetch=[5])
> 00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
> 00-16 Sort(sort0=[$0], dir0=[ASC])
> 00-17 StreamAgg(group=[{0, 1}])
> 00-18 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]])
> {code}
> If you turn off merge join, query returns correct result:
> {code}
> 0: jdbc:drill:schema=dfs> select * from
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > b1,
> . . . . . . . . . . . . > count(distinct a1)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . > b1
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > b1 limit 5 offset 1
> . . . . . . . . . . . . > ) as sq1(x1, y1)
> . . . . . . . . . . . . >
> . . . . . . . . . . . . > inner join
> . . . . . . . . . . . . > (
> . . . . . . . . . . . . > select
> . . . . . . . . . . . . > b1,
> . . . . . . . . . . . . > count(distinct a1)
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > t1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . > b1
> . . . . . . . . . . . . > order by
> . . . . . . . . . . . . > b1 limit 5 offset 1
> . . . . . . . . . . . . > ) as sq2(x1, y1)
> . . . . . . . . . . . . > on
> . . . . . . . . . . . . > sq1.x1 = sq2.x1 and
> . . . . . . . . . . . . > sq2.y1 = sq2.y1
> . . . . . . . . . . . . > ;
> +------------+------------+------------+------------+
> | x1 | y1 | x10 | y10 |
> +------------+------------+------------+------------+
> | bbbbb | 1 | bbbbb | 1 |
> | ccccc | 1 | ccccc | 1 |
> | eeeee | 1 | eeeee | 1 |
> | fffff | 1 | fffff | 1 |
> | ggggg | 1 | ggggg | 1 |
> +------------+------------+------------+------------+
> 5 rows selected (0.352 seconds)
> {code}
> cut/paste reproduction
> {code:sql}
> select * from
> (
> select
> b1,
> count(distinct a1)
> from
> t1
> group by
> b1
> order by
> b1 limit 5 offset 1
> ) as sq1(x1, y1)
>
> inner join
>
> (
> select
> b1,
> count(distinct a1)
> from
> t1
> group by
> b1
> order by
> b1 limit 5 offset 1
> ) as sq2(x1, y1)
> on
> sq1.x1 = sq2.x1 and
> sq2.y1 = sq2.y1
> ;
> {code}
> This test has been running and passing in regression test suite until framework was switched to a new code, where JSON parsing was replaced with jackson and for a brief period ( I believe Friday afternoon until now ) this suite was not executed.
> We already have a merge join bug DRILL-2010, but this one seems to be different (small data set) and feels like a recent regression.
> For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running with merge join disabled until this issue is fixed. Will need to remove alter session from this file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)