You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2015/06/05 17:30:00 UTC
[jira] [Commented] (DRILL-3254) Average over window functions
returns wrong results
[ https://issues.apache.org/jira/browse/DRILL-3254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574683#comment-14574683 ]
Deneche A. Hakim commented on DRILL-3254:
-----------------------------------------
Here is the plan for AVG with OVER clause:
{noformat}
0: jdbc:drill:zk=local> explain plan for select avg(salary) over(partition by position_id) from `windowData/b1.p1`;
00-00 Screen
00-01 Project(EXPR$0=[/(CASE(>($2, 0), CAST($3):ANY, null), $2)])
00-02 Window(window#0=[window(partition {1} order by [] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($0), $SUM0($0)])])
00-03 SelectionVectorRemover
00-04 Sort(sort0=[$1], dir0=[ASC])
00-05 Project(salary=[$1], position_id=[$0])
00-06 Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/hakim/MapR/data/windowData/b1.p1, numFiles=1, columns=[`salary`, `position_id`], files=[file:/Users/hakim/MapR/data/windowData/b1.p1/0.data.json]]])
...
{noformat}
and without OVER clause:
{noformat}
0: jdbc:drill:zk=local> explain plan for select avg(salary) from `windowData/b1.p1`;
00-00 Screen
00-01 Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY NOT NULL])
00-02 StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
00-03 Scan(groupscan=[EasyGroupScan [selectionRoot=/Users/hakim/MapR/data/windowData/b1.p1, numFiles=1, columns=[`salary`], files=[file:/Users/hakim/MapR/data/windowData/b1.p1/0.data.json]]])
...
{noformat}
Both queries reduce {{AVG(X)}} to {{SUM(X) / COUNT(X)}}, but in the case of the OVER clause a {{CASTHIGH}} is missing from the final project.
> Average over window functions returns wrong results
> ---------------------------------------------------
>
> Key: DRILL-3254
> URL: https://issues.apache.org/jira/browse/DRILL-3254
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.1.0
> Reporter: Abhishek Girish
> Assignee: Deneche A. Hakim
> Labels: window_function
> Fix For: 1.1.0
>
>
> Average function on numeric column returns an (inaccurate) integer value, instead of an (accurate) decimal (or floating point) value.
> *Results from Drill:*
> {code:sql}
> > select s_city, s_store_sk, avg(s_number_employees) over (PARTITION BY s_city ORDER BY s_store_sk) from store limit 10;
> +-----------+-------------+---------+
> | s_city | s_store_sk | EXPR$2 |
> +-----------+-------------+---------+
> | Fairview | 5 | 288 |
> | Fairview | 8 | 283 |
> | Fairview | 12 | 286 |
> | Midway | 1 | 245 |
> | Midway | 2 | 240 |
> | Midway | 3 | 239 |
> | Midway | 4 | 233 |
> | Midway | 6 | 232 |
> | Midway | 7 | 243 |
> | Midway | 9 | 247 |
> +-----------+-------------+---------+
> 10 rows selected (0.197 seconds)
> {code}
> *Results from Postgres:*
> {code:sql}
> # select s_city, s_store_sk, avg(s_number_employees) over (PARTITION BY s_city ORDER BY s_store_sk) from store limit 10;
> s_city | s_store_sk | avg
> ----------+------------+----------------------
> Fairview | 5 | 288.0000000000000000
> Fairview | 8 | 283.0000000000000000
> Fairview | 12 | 286.6666666666666667
> Midway | 1 | 245.0000000000000000
> Midway | 2 | 240.5000000000000000
> Midway | 3 | 239.0000000000000000
> Midway | 4 | 233.7500000000000000
> Midway | 6 | 232.8000000000000000
> Midway | 7 | 243.5000000000000000
> Midway | 9 | 247.4285714285714286
> (10 rows)
> {code}
> Drill returns right results without window functions:
> {code:sql}
> > select s_city, s_store_sk, avg(s_number_employees) from store group by s_city, s_store_sk order by 1,2 limit 10;
> +-----------+-------------+---------+
> | s_city | s_store_sk | EXPR$2 |
> +-----------+-------------+---------+
> | Fairview | 5 | 288.0 |
> | Fairview | 8 | 278.0 |
> | Fairview | 12 | 294.0 |
> | Midway | 1 | 245.0 |
> | Midway | 2 | 236.0 |
> | Midway | 3 | 236.0 |
> | Midway | 4 | 218.0 |
> | Midway | 6 | 229.0 |
> | Midway | 7 | 297.0 |
> | Midway | 9 | 271.0 |
> +-----------+-------------+---------+
> 10 rows selected (0.306 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)