You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/03/18 00:12:38 UTC
[jira] [Created] (DRILL-2488) Wrong result on join between two
subqueries with aggregation
Victoria Markman created DRILL-2488:
---------------------------------------
Summary: Wrong result on join between two subqueries with aggregation
Key: DRILL-2488
URL: https://issues.apache.org/jira/browse/DRILL-2488
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 0.8.0
Reporter: Victoria Markman
Assignee: Chris Westin
Priority: Critical
{code}
0: jdbc:drill:schema=dfs> select * from t1;
+------------+------------+------------+
| a1 | b1 | c1 |
+------------+------------+------------+
| 1 | aaaaa | 2015-01-01 |
| 2 | bbbbb | 2015-01-02 |
| 3 | ccccc | 2015-01-03 |
| 4 | null | 2015-01-04 |
| 5 | eeeee | 2015-01-05 |
| 6 | fffff | 2015-01-06 |
| 7 | ggggg | 2015-01-07 |
| null | hhhhh | 2015-01-08 |
| 9 | iiiii | null |
| 10 | jjjjj | 2015-01-10 |
+------------+------------+------------+
10 rows selected (0.15 seconds)
{code}
This result is incorrect, one row is missing
{code}
0: jdbc:drill:schema=dfs> select * from
. . . . . . . . . . . . > (
. . . . . . . . . . . . > select
. . . . . . . . . . . . > b1,
. . . . . . . . . . . . > count(distinct a1)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . > b1
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > b1 limit 5 offset 1
. . . . . . . . . . . . > ) as sq1(x1, y1)
. . . . . . . . . . . . >
. . . . . . . . . . . . > inner join
. . . . . . . . . . . . >
. . . . . . . . . . . . > (
. . . . . . . . . . . . > select
. . . . . . . . . . . . > b1,
. . . . . . . . . . . . > count(distinct a1)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . > b1
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > b1 limit 5 offset 1
. . . . . . . . . . . . > ) as sq2(x1, y1)
. . . . . . . . . . . . > on
. . . . . . . . . . . . > sq1.x1 = sq2.x1 and
. . . . . . . . . . . . > sq2.y1 = sq2.y1
. . . . . . . . . . . . > ;
+------------+------------+------------+------------+
| x1 | y1 | x10 | y10 |
+------------+------------+------------+------------+
| bbbbb | 1 | bbbbb | 1 |
| ccccc | 1 | ccccc | 1 |
| eeeee | 1 | eeeee | 1 |
| fffff | 1 | fffff | 1 |
+------------+------------+------------+------------+
4 rows selected (0.28 seconds)
{code}
Explain plan for the wrong result:
{code}
00-01 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-02 Project(x1=[$0], y1=[$1], x10=[$2], y10=[$3])
00-03 MergeJoin(condition=[=($0, $2)], joinType=[inner])
00-05 Limit(offset=[1], fetch=[5])
00-07 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-09 Sort(sort0=[$0], dir0=[ASC])
00-11 StreamAgg(group=[{0, 1}])
00-13 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-15 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]])
00-04 Project(b10=[$0], EXPR$10=[$1])
00-06 SelectionVectorRemover
00-08 Sort(sort0=[$0], dir0=[ASC])
00-10 Filter(condition=[=($1, $1)])
00-12 Limit(offset=[1], fetch=[5])
00-14 StreamAgg(group=[{0}], EXPR$1=[COUNT($1)])
00-16 Sort(sort0=[$0], dir0=[ASC])
00-17 StreamAgg(group=[{0, 1}])
00-18 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-19 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/aggregation/t1]], selectionRoot=/drill/testdata/aggregation/t1, numFiles=1, columns=[`b1`, `a1`]]])
{code}
If you turn off merge join, query returns correct result:
{code}
0: jdbc:drill:schema=dfs> select * from
. . . . . . . . . . . . > (
. . . . . . . . . . . . > select
. . . . . . . . . . . . > b1,
. . . . . . . . . . . . > count(distinct a1)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . > b1
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > b1 limit 5 offset 1
. . . . . . . . . . . . > ) as sq1(x1, y1)
. . . . . . . . . . . . >
. . . . . . . . . . . . > inner join
. . . . . . . . . . . . > (
. . . . . . . . . . . . > select
. . . . . . . . . . . . > b1,
. . . . . . . . . . . . > count(distinct a1)
. . . . . . . . . . . . > from
. . . . . . . . . . . . > t1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . > b1
. . . . . . . . . . . . > order by
. . . . . . . . . . . . > b1 limit 5 offset 1
. . . . . . . . . . . . > ) as sq2(x1, y1)
. . . . . . . . . . . . > on
. . . . . . . . . . . . > sq1.x1 = sq2.x1 and
. . . . . . . . . . . . > sq2.y1 = sq2.y1
. . . . . . . . . . . . > ;
+------------+------------+------------+------------+
| x1 | y1 | x10 | y10 |
+------------+------------+------------+------------+
| bbbbb | 1 | bbbbb | 1 |
| ccccc | 1 | ccccc | 1 |
| eeeee | 1 | eeeee | 1 |
| fffff | 1 | fffff | 1 |
| ggggg | 1 | ggggg | 1 |
+------------+------------+------------+------------+
5 rows selected (0.352 seconds)
{code}
This test has been running and passing in regression test suite until framework was switched to a new code, where JSON parsing was replaced with jackson and for a brief period ( I believe Friday afternoon until now ) this suite was not executed.
We already have a merge join bug DRILL-2010, but this one seems to be different (small data set) and feels like a recent regression.
For QA: test Functional/Passing/aggregation/sanity/q18.sql will be running with merge join disabled until this issue is fixed. Will need to remove alter session from this file.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)