You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/04/28 02:11:05 UTC
[jira] [Commented] (DRILL-1977) Wrong result with aggregation on
top of UNION ALL operator
[ https://issues.apache.org/jira/browse/DRILL-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14515993#comment-14515993 ]
Victoria Markman commented on DRILL-1977:
-----------------------------------------
{code}
Verified fixed in 0.9.0
drillGitId=3689522d4a7035a966f19695a678c6881fdaeba6
{code}
Tests:
Functional/Passing/aggregation/bugs/drill-1977-1.sql
Functional/Passing/aggregation/bugs/drill-1977-2.sql
> Wrong result with aggregation on top of UNION ALL operator
> ----------------------------------------------------------
>
> Key: DRILL-1977
> URL: https://issues.apache.org/jira/browse/DRILL-1977
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.8.0
> Reporter: Victoria Markman
> Assignee: Sean Hsuan-Yi Chu
> Priority: Critical
> Fix For: 0.8.0
>
>
> {code}
> Jan 10 01:17:13 root@~ ] # hadoop fs -cat /test/t.json
> { "a1": 0, "b1": 0, "c1": "true", "d1" : "2015-01-02"}
> { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-03"}
> { "a1": 0, "b1": 0, "c1": "false" , "d1" : "2015-01-04"}
> { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-05"}
> { "a1": 1, "b1": 1, "c1": "true" , "d1" : "2015-01-06"}
> {code}
> Query below returns wrong result, we should have 4 groups, instead we get only two groups from right side of the union.
> Notice lack of project step above union all (unclear if it has anything to to with the bug)
> {code}
> 0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1;
> +------------+------------+------------+------------+
> | calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
> +------------+------------+------------+------------+
> | 100 | 0 | 0 | 3 |
> | 101 | 1 | 1 | 2 |
> +------------+------------+------------+------------+
> 2 rows selected (0.181 seconds)
> 00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02 HashAgg(group=[{0}], EXPR$1=[MAX($1)], EXPR$2=[MIN($2)], EXPR$3=[COUNT($3)])
> 00-03 UnionAll(all=[true])
> 00-05 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0])
> 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
> 00-04 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0])
> 00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
> {code}
> Correct result, when order of the columns is different in aggregate functions: "min(b1), max(a1)" instead of "max(a1), min(b1)"
> {code}
> 0: jdbc:drill:schema=dfs> select calc1, min(b1), max(a1), count(c1) from (select a1+10 as calc1, a1, b1, c1 from `t.json` union all select a1+100 as calc1, a1, b1, c1 from `t.json`) group by calc1;
> +------------+------------+------------+------------+
> | calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
> +------------+------------+------------+------------+
> | 10 | 0 | 0 | 3 |
> | 11 | 1 | 1 | 2 |
> | 100 | 0 | 0 | 3 |
> | 101 | 1 | 1 | 2 |
> +------------+------------+------------+------------+
> 4 rows selected (0.256 seconds)
> 00-01 Project(calc1=[$0], EXPR$1=[$1], EXPR$2=[$2], EXPR$3=[$3])
> 00-02 HashAgg(group=[{0}], EXPR$1=[MIN($1)], EXPR$2=[MAX($2)], EXPR$3=[COUNT($3)])
> 00-03 Project(calc1=[$0], b1=[$2], a1=[$1], c1=[$3])
> 00-04 UnionAll(all=[true])
> 00-06 Project(calc1=[+($2, 10)], a1=[$2], b1=[$1], c1=[$0])
> 00-08 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]])
> 00-05 Project(calc1=[+($2, 100)], a1=[$2], b1=[$1], c1=[$0])
> 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test/t.json, numFiles=1, columns=[`a1`, `b1`, `c1`], files=[maprfs:/test/t.json]]]){code}
> Non-union query works correctly:
> {code}
> 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(a1), max(b1), count(c1) from `t.json` group by a1+10;
> +------------+------------+------------+------------+
> | calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
> +------------+------------+------------+------------+
> | 10 | 0 | 0 | 3 |
> | 11 | 1 | 1 | 2 |
> +------------+------------+------------+------------+
> 2 rows selected (0.142 seconds)
> 0: jdbc:drill:schema=dfs> select a1+10 as calc1, min(b1), max(a1), count(c1) from `t.json` group by a1+10;
> +------------+------------+------------+------------+
> | calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
> +------------+------------+------------+------------+
> | 10 | 0 | 0 | 3 |
> | 11 | 1 | 1 | 2 |
> +------------+------------+------------+------------+
> 2 rows selected (0.153 seconds)
> {code}
> If I switch legs of a UNION, we still get wrong result, but from the right side:
> {code}
> 0: jdbc:drill:schema=dfs> select calc1, max(a1), min(b1), count(c1) from (select a1+100 as calc1, a1, b1, c1 from `t.json` union all select a1+10 as calc1, a1, b1, c1 from `t.json`) group by calc1;
> +------------+------------+------------+------------+
> | calc1 | EXPR$1 | EXPR$2 | EXPR$3 |
> +------------+------------+------------+------------+
> | 10 | 0 | 0 | 3 |
> | 11 | 1 | 1 | 2 |
> +------------+------------+------------+------------+
> 2 rows selected (0.19 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)