You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/04/04 01:06:53 UTC

[jira] [Updated] (DRILL-2247) SUM with CASE statement on the column of the null producing side of left outer join returns wrong result

     [ https://issues.apache.org/jira/browse/DRILL-2247?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha updated DRILL-2247:
------------------------------
    Fix Version/s:     (was: 0.9.0)
                   1.0.0

> SUM with CASE statement on the column of the null producing side of left outer join returns wrong result
> --------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-2247
>                 URL: https://issues.apache.org/jira/browse/DRILL-2247
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Jinfeng Ni
>             Fix For: 1.0.0
>
>         Attachments: x1.parquet, x2.parquet
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from x1;
> +------------+
> |     c1     |
> +------------+
> | 1032.6516  |
> +------------+
> 1 row selected (0.093 seconds)
> 0: jdbc:drill:schema=dfs> select * from x2;
> +------------+
> |     c1     |
> +------------+
> | 1057.3117  |
> | 1090.8299  |
> +------------+
> 2 rows selected (0.085 seconds)
> {code}
> Correct result is NULL:
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         sum(x2.c1)                                                 as simple_sum,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+---------------+
> | simple_sum | sum_with_case |
> +------------+---------------+
> | 0.0000     | null          |
> +------------+---------------+
> 1 row selected (0.173 seconds)
> {code}
> Query plan for correct result:
> {code}
> 00-01      Project(simple_sum=[$0], sum_with_case=[$1])
> 00-02        UnionExchange
> 01-01          Project(simple_sum=[$1], sum_with_case=[$2])
> 01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Wrong result: sum with case returns 0.0 instead of NULL
> {code}
> 0: jdbc:drill:schema=dfs> select
> . . . . . . . . . . . . >         count(*)                                                   as count_star,
> . . . . . . . . . . . . >         sum(x2.c1)                                                 as simle_sum,
> . . . . . . . . . . . . >         round(avg(x2.c1))                                          as round_avg_x2_c1,
> . . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> . . . . . . . . . . . . > from
> . . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
> . . . . . . . . . . . . > group by
> . . . . . . . . . . . . >         x1.c1;
> +------------+------------+-----------------+---------------+
> | count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
> +------------+------------+-----------------+---------------+
> | 1          | null       | null            | 0.0           |
> +------------+------------+-----------------+---------------+
> {code}
> Query plan for the wrong result query:
> {code}
> 00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
> 00-02        UnionExchange
> 01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY)], sum_with_case=[$4])
> 01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)], agg#3=[$SUM0($2)])
> 01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))])
> 01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
> 01-06                  HashToRandomExchange(dist0=[[$0]])
> 02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
> 01-05                  Project(c10=[$0])
> 01-07                    HashToRandomExchange(dist0=[[$0]])
> 03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
> {code}
> Cut/paste version of the query, comment out round function, query will return correct result.
> {code:sql}
> select
>         count(*)     as count_star,                                
>         sum(x2.c1)   as simle_sum,                                 
>         round(avg(x2.c1))  as round_avg_x2_c1,                     
>         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
> from
>         x1 left outer join x2 on x1.c1 = x2.c1
> group by
>         x1.c1;
> {code}
> Not sure about how critical this bug is. Inconsistency is bad.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)