You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/04/28 02:16:06 UTC
[jira] [Commented] (DRILL-2092) Incorrect result with count
distinct and sum aggregates
[ https://issues.apache.org/jira/browse/DRILL-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516000#comment-14516000 ]
Victoria Markman commented on DRILL-2092:
-----------------------------------------
{code}
Verified fixed in 0.9.0
drillGitId=3689522d4a7035a966f19695a678c6881fdaeba6
{code}
Tests are checked in under:
Functional/Passing/aggregation/count_distinct - sanity test cases for distinct aggregation with different data types.
Functional/Passing/aggregation/count_distinct/q[1-8].sql - queries to verify this particular bug
> Incorrect result with count distinct and sum aggregates
> -------------------------------------------------------
>
> Key: DRILL-2092
> URL: https://issues.apache.org/jira/browse/DRILL-2092
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.8.0
> Reporter: Victoria Markman
> Assignee: Aman Sinha
> Priority: Critical
> Fix For: 0.8.0
>
>
> test.json
> {code}
> { "a1" : 10 , "b1" : 10 }
> { "a1" : 20 , "b1" : 20 }
> { "a1" : 20 , "b1" : 20}
> { "a1" : 30 , "b1" : 30 }
> { "a1" : null , "b1": null}
> {code}
> {code}
> 0: jdbc:drill:schema=dfs> select a1, count(distinct a1) from `test.json` group by a1;
> +------------+------------+
> | a1 | EXPR$1 |
> +------------+------------+
> | 10 | 1 |
> | 20 | 1 |
> | 30 | 1 |
> | null | 0 |
> +------------+------------+
> 4 rows selected (0.096 seconds)
> {code}
> If I add sum on the same column, I get wrong result (null group is gone):
> {code}
> 0: jdbc:drill:schema=dfs> select a1, count(distinct a1), sum(a1) from `test.json` group by a1;
> +------------+------------+------------+
> | a1 | EXPR$1 | EXPR$2 |
> +------------+------------+------------+
> | 10 | 1 | 10 |
> | 20 | 1 | 40 |
> | 30 | 1 | 30 |
> +------------+------------+------------+
> 3 rows selected (0.137 seconds)
> {code}
> Non-distinct count works correctly:
> {code}
> 0: jdbc:drill:schema=dfs> select a1, count(a1), sum(a1) from `test.json` group by a1;
> +------------+------------+------------+
> | a1 | EXPR$1 | EXPR$2 |
> +------------+------------+------------+
> | 10 | 1 | 10 |
> | 20 | 2 | 40 |
> | 30 | 1 | 30 |
> | null | 0 | null |
> +------------+------------+------------+
> 4 rows selected (0.187 seconds)
> {code}
> Plan for the query with the wrong result:
> {code}
> 00-01 Project(a1=[$0], EXPR$1=[$1], EXPR$2=[$2])
> 00-02 Project(a1=[$0], EXPR$1=[$3], EXPR$2=[$1])
> 00-03 HashJoin(condition=[IS NOT DISTINCT FROM($0, $2)], joinType=[inner])
> 00-05 HashAgg(group=[{0}], EXPR$2=[SUM($0)])
> 00-07 Scan(groupscan=[EasyGroupScan [selectionRoot=/test.json, numFiles=1, columns=[`a1`], files=[maprfs:/test.json]]])
> 00-04 Project(a10=[$0], EXPR$1=[$1])
> 00-06 HashAgg(group=[{0}], EXPR$1=[COUNT($0)])
> 00-08 HashAgg(group=[{0}])
> 00-09 Scan(groupscan=[EasyGroupScan [selectionRoot=/test.json, numFiles=1, columns=[`a1`], files=[maprfs:/test.json]]])
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)