You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/04/04 01:06:53 UTC
[jira] [Updated] (DRILL-2247) SUM with CASE statement on the column
of the null producing side of left outer join returns wrong result
[ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha updated DRILL-2247:
------------------------------
Fix Version/s: (was: 0.9.0)
1.0.0
> SUM with CASE statement on the column of the null producing side of left outer join returns wrong result
> --------------------------------------------------------------------------------------------------------
>
> Key: DRILL-2247
> URL: https://issues.apache.org/jira/browse/DRILL-2247
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 0.8.0
> Reporter: Victoria Markman
> Assignee: Jinfeng Ni
> Fix For: 1.0.0
>
> Attachments: x1.parquet, x2.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from x1;
> +------------+
> | c1 |
> +------------+
> | 1032.6516 |
> +------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:schema=dfs> select * from x2;
> +------------+
> | c1 |
> +------------+
> | 1057.3117 |
> | 1090.8299 |
> +------------+
> 2 rows selected (0.085 seconds)
> {code}
> Correct result is NULL:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > sum(x2.c1) as simple_sum,
> . . . . . . . . . . . . > sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . > x1.c1;
> +------------+---------------+
> | simple_sum | sum_with_case |
> +------------+---------------+
> | 0.0000 | null |
> +------------+---------------+
> 1 row selected (0.173 seconds)
> {code}
> Query plan for correct result:
> {code}
> 00-01 Project(simple_sum=[$0], sum_with_case=[$1])
> 00-02 UnionExchange
> 01-01 Project(simple_sum=[$1], sum_with_case=[$2])
> 01-02 HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
> 01-03 Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))])
> 01-04 HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05 Project(c10=[$0])
> 01-07 HashToRandomExchange(dist0=[[$0]])
> 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Wrong result: sum with case returns 0.0 instead of NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . > count(*) as count_star,
> . . . . . . . . . . . . > sum(x2.c1) as simle_sum,
> . . . . . . . . . . . . > round(avg(x2.c1)) as round_avg_x2_c1,
> . . . . . . . . . . . . > sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . > x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . > x1.c1;
> +------------+------------+-----------------+---------------+
> | count_star | simle_sum | round_avg_x2_c1 | sum_with_case |
> +------------+------------+-----------------+---------------+
> | 1 | null | null | 0.0 |
> +------------+------------+-----------------+---------------+
> {code}
> Query plan for the wrong result query:
> {code}
> 00-01 Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
> 00-02 UnionExchange
> 01-01 Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY)], sum_with_case=[$4])
> 01-02 HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)], agg#3=[$SUM0($2)])
> 01-03 Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))])
> 01-04 HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06 HashToRandomExchange(dist0=[[$0]])
> 02-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05 Project(c10=[$0])
> 01-07 HashToRandomExchange(dist0=[[$0]])
> 03-01 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Cut/paste version of the query, comment out round function, query will return correct result.
> {code:sql}
> select
> count(*) as count_star,
> sum(x2.c1) as simle_sum,
> round(avg(x2.c1)) as round_avg_x2_c1,
> sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> from
> x1 left outer join x2 on x1.c1 = x2.c1
> group by
> x1.c1;
> {code}
> Not sure about how critical this bug is. Inconsistency is bad.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)