You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2015/06/05 17:30:00 UTC

[jira] [Commented] (DRILL-3254) Average over window functions returns wrong results

    [ https://issues.apache.org/jira/browse/DRILL-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574683#comment-14574683 ] 

Deneche A. Hakim commented on DRILL-3254:
-----------------------------------------

Here is the plan for AVG with OVER clause:
{noformat}
0: jdbc:drill:zk=local> explain plan for select avg(salary) over(partition by position_id) from `windowData/b1.p1`;
00-00    Screen
00-01      Project(EXPR$0=[/(CASE(>($2, 0), CAST($3):ANY, null), $2)])
00-02        Window(window#0=[window(partition {1} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($0), $SUM0($0)])])
00-03          SelectionVectorRemover
00-04            Sort(sort0=[$1], dir0=[ASC])
00-05              Project(salary=[$1], position_id=[$0])
00-06                Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/hakim/MapR/data/windowData/b1.p1, numFiles=1, columns=[`salary`, `position_id`], files=[file:/Users/hakim/MapR/data/windowData/b1.p1/0.data.json]]])
...
{noformat}

and without OVER clause:
{noformat}
0: jdbc:drill:zk=local> explain plan for select avg(salary) from `windowData/b1.p1`;
00-00    Screen
00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY NOT NULL])
00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
00-03          Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/hakim/MapR/data/windowData/b1.p1, numFiles=1, columns=[`salary`], files=[file:/Users/hakim/MapR/data/windowData/b1.p1/0.data.json]]])
...
{noformat}

Both queries reduce {{AVG(X)}} to {{SUM(X) / COUNT(X)}}, but in the case of the OVER clause a {{CASTHIGH}} is missing from the final project.

> Average over window functions returns wrong results
> ---------------------------------------------------
>
>                 Key: DRILL-3254
>                 URL: https://issues.apache.org/jira/browse/DRILL-3254
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.1.0
>            Reporter: Abhishek Girish
>            Assignee: Deneche A. Hakim
>              Labels: window_function
>             Fix For: 1.1.0
>
>
> Average function on numeric column returns an (inaccurate) integer value, instead of an (accurate) decimal (or floating point) value.
> *Results from Drill:*
> {code:sql}
> > select s_city, s_store_sk, avg(s_number_employees) over (PARTITION BY s_city ORDER BY s_store_sk) from store limit 10;
> +-----------+-------------+---------+
> |  s_city   | s_store_sk  | EXPR$2  |
> +-----------+-------------+---------+
> | Fairview  | 5           | 288     |
> | Fairview  | 8           | 283     |
> | Fairview  | 12          | 286     |
> | Midway    | 1           | 245     |
> | Midway    | 2           | 240     |
> | Midway    | 3           | 239     |
> | Midway    | 4           | 233     |
> | Midway    | 6           | 232     |
> | Midway    | 7           | 243     |
> | Midway    | 9           | 247     |
> +-----------+-------------+---------+
> 10 rows selected (0.197 seconds)
> {code}
> *Results from Postgres:*
> {code:sql}
> # select s_city, s_store_sk, avg(s_number_employees) over (PARTITION BY s_city ORDER BY s_store_sk) from store limit 10;
>   s_city  | s_store_sk |         avg          
> ----------+------------+----------------------
>  Fairview |          5 | 288.0000000000000000
>  Fairview |          8 | 283.0000000000000000
>  Fairview |         12 | 286.6666666666666667
>  Midway   |          1 | 245.0000000000000000
>  Midway   |          2 | 240.5000000000000000
>  Midway   |          3 | 239.0000000000000000
>  Midway   |          4 | 233.7500000000000000
>  Midway   |          6 | 232.8000000000000000
>  Midway   |          7 | 243.5000000000000000
>  Midway   |          9 | 247.4285714285714286
> (10 rows)
> {code}
> Drill returns right results without window functions:
> {code:sql}
> > select s_city, s_store_sk, avg(s_number_employees) from store group by s_city, s_store_sk order by 1,2 limit 10;
> +-----------+-------------+---------+
> |  s_city   | s_store_sk  | EXPR$2  |
> +-----------+-------------+---------+
> | Fairview  | 5           | 288.0   |
> | Fairview  | 8           | 278.0   |
> | Fairview  | 12          | 294.0   |
> | Midway    | 1           | 245.0   |
> | Midway    | 2           | 236.0   |
> | Midway    | 3           | 236.0   |
> | Midway    | 4           | 218.0   |
> | Midway    | 6           | 229.0   |
> | Midway    | 7           | 297.0   |
> | Midway    | 9           | 271.0   |
> +-----------+-------------+---------+
> 10 rows selected (0.306 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)