You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/02/14 00:19:12 UTC

[jira] [Created] (DRILL-2247) SUM with CASE statement on the column of the null producing side of left outer join returns wrong result

Victoria Markman created DRILL-2247:
---------------------------------------

             Summary: SUM with CASE statement on the column of the null producing side of left outer join returns wrong result
                 Key: DRILL-2247
                 URL: https://issues.apache.org/jira/browse/DRILL-2247
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.8.0
            Reporter: Victoria Markman
            Assignee: Jinfeng Ni
            Priority: Critical


{code}
0: jdbc:drill:schema=dfs> select * from x1;
+------------+
|     c1     |
+------------+
| 1032.6516  |
+------------+
1 row selected (0.093 seconds)

0: jdbc:drill:schema=dfs> select * from x2;
+------------+
|     c1     |
+------------+
| 1057.3117  |
| 1090.8299  |
+------------+
2 rows selected (0.085 seconds)
{code}
Correct result is NULL:
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         sum(x2.c1)                                                 as simple_sum,
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------------+---------------+
| simple_sum | sum_with_case |
+------------+---------------+
| 0.0000     | null          |
+------------+---------------+
1 row selected (0.173 seconds)
{code}

Query plan for correct result:
{code}
00-01      Project(simple_sum=[$0], sum_with_case=[$1])
00-02        UnionExchange
01-01          Project(simple_sum=[$1], sum_with_case=[$2])
01-02            HashAgg(group=[{0}], simple_sum=[SUM($1)], sum_with_case=[SUM($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))])
01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
01-06                  HashToRandomExchange(dist0=[[$0]])
02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
01-05                  Project(c10=[$0])
01-07                    HashToRandomExchange(dist0=[[$0]])
03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
{code}
Wrong result: sum with case returns 0.0 instead of NULL
{code}
0: jdbc:drill:schema=dfs> select
. . . . . . . . . . . . >         count(*)                                                   as count_star,
. . . . . . . . . . . . >         sum(x2.c1)                                                 as simle_sum,
. . . . . . . . . . . . >         round(avg(x2.c1))                                          as round_avg_x2_c1,
. . . . . . . . . . . . >         sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
. . . . . . . . . . . . > from
. . . . . . . . . . . . >         x1 left outer join x2 on x1.c1 = x2.c1
. . . . . . . . . . . . > group by
. . . . . . . . . . . . >         x1.c1;
+------------+------------+-----------------+---------------+
| count_star | simle_sum  | round_avg_x2_c1 | sum_with_case |
+------------+------------+-----------------+---------------+
| 1          | null       | null            | 0.0           |
+------------+------------+-----------------+---------------+
{code}

Query plan for the wrong result query:
{code}
00-01      Project(count_star=[$0], simle_sum=[$1], round_avg_x2_c1=[$2], sum_with_case=[$3])
00-02        UnionExchange
01-01          Project(count_star=[$1], simle_sum=[CASE(=($3, 0), null, $2)], round_avg_x2_c1=[ROUND(CAST(/(CastHigh(CASE(=($3, 0), null, $2)), $3)):ANY)], sum_with_case=[$4])
01-02            HashAgg(group=[{0}], count_star=[COUNT()], agg#1=[$SUM0($1)], agg#2=[COUNT($1)], agg#3=[$SUM0($2)])
01-03              Project(c1=[$0], c10=[$1], $f2=[CASE(=($1, 0), CAST(100):ANY NOT NULL, ROUND(/($1, 12)))])
01-04                HashJoin(condition=[=($0, $1)], joinType=[left])
01-06                  HashToRandomExchange(dist0=[[$0]])
02-01                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x1]], selectionRoot=/aggregation/x1, numFiles=1, columns=[`c1`]]])
01-05                  Project(c10=[$0])
01-07                    HashToRandomExchange(dist0=[[$0]])
03-01                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/aggregation/x2]], selectionRoot=/aggregation/x2, numFiles=1, columns=[`c1`]]])
{code}

Cut/paste version of the query, comment out round function, query will return correct result.
{code:sql}
select
        count(*)     as count_star,                                
        sum(x2.c1)   as simle_sum,                                 
        round(avg(x2.c1))  as round_avg_x2_c1,                     
        sum(case when x2.c1 = 0 then 100 else round(x2.c1/12) end) as sum_with_case
from
        x1 left outer join x2 on x1.c1 = x2.c1
group by
        x1.c1;
{code}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)