You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2015/01/06 18:00:37 UTC
[jira] [Comment Edited] (DRILL-1908) new window function
implementation
[ https://issues.apache.org/jira/browse/DRILL-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266386#comment-14266386 ]
Deneche A. Hakim edited comment on DRILL-1908 at 1/6/15 4:59 PM:
-----------------------------------------------------------------
I rebased the code and added unit tests. The code still needs cleaning and proper commenting, but it already fixes DRILL-1487 *partially* (it still doesn't handle _order by_ clause if it's different from the _partition by_ clause).
Here is an example of a query that returns correct results:
{noformat}
SELECT employee_id,position_id, salary, avg(salary) OVER (PARTITION BY position_id ORDER BY position_id) AS avg_sal FROM cp.`employee.json` ORDER BY position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id | salary | avg_sal |
+-------------+-------------+------------+------------+
| 1 | 1 | 80000.0 | 80000.0 |
| 5 | 2 | 35000.0 | 35833.333333333336 |
| 2 | 2 | 40000.0 | 35833.333333333336 |
| 4 | 2 | 40000.0 | 35833.333333333336 |
| 20 | 2 | 30000.0 | 35833.333333333336 |
| 21 | 2 | 35000.0 | 35833.333333333336 |
| 22 | 2 | 35000.0 | 35833.333333333336 |
| 6 | 3 | 25000.0 | 25000.0 |
| 7 | 4 | 15000.0 | 15000.0 |
| 10 | 5 | 50000.0 | 50000.0 |
| 40 | 6 | 10000.0 | 8675.0 |
| 39 | 6 | 10000.0 | 8675.0 |
| 37 | 6 | 6700.0 | 8675.0 |
| 38 | 6 | 8000.0 | 8675.0 |
| 42 | 7 | 5000.0 | 19500.0 |
| 36 | 7 | 45000.0 | 19500.0 |
| 41 | 7 | 8500.0 | 19500.0 |
| 43 | 8 | 6700.0 | 5850.0 |
| 44 | 8 | 5000.0 | 5850.0 |
| 47 | 9 | 6500.0 | 5887.5 |
| 48 | 9 | 7200.0 | 5887.5 |
| 49 | 9 | 5000.0 | 5887.5 |
| 50 | 9 | 5000.0 | 5887.5 |
| 51 | 9 | 5000.0 | 5887.5 |
| 52 | 9 | 5000.0 | 5887.5 |
| 45 | 9 | 6800.0 | 5887.5 |
| 46 | 9 | 6600.0 | 5887.5 |
| 9 | 11 | 17000.0 | 13625.0 |
| 13 | 11 | 15000.0 | 13625.0 |
| 14 | 11 | 17000.0 | 13625.0 |
| 15 | 11 | 12000.0 | 13625.0 |
| 11 | 11 | 15000.0 | 13625.0 |
| 8 | 11 | 10000.0 | 13625.0 |
| 28 | 11 | 14000.0 | 13625.0 |
| 29 | 11 | 8500.0 | 13625.0 |
| 30 | 11 | 15000.0 | 13625.0 |
| 16 | 11 | 17000.0 | 13625.0 |
| 17 | 11 | 10000.0 | 13625.0 |
| 18 | 11 | 17000.0 | 13625.0 |
| 31 | 11 | 14000.0 | 13625.0 |
| 19 | 11 | 10000.0 | 13625.0 |
| 32 | 11 | 17000.0 | 13625.0 |
| 23 | 11 | 15000.0 | 13625.0 |
| 24 | 11 | 17000.0 | 13625.0 |
| 25 | 11 | 12000.0 | 13625.0 |
| 26 | 11 | 15000.0 | 13625.0 |
| 27 | 11 | 16000.0 | 13625.0 |
| 12 | 11 | 8500.0 | 13625.0 |
| 33 | 11 | 17000.0 | 13625.0 |
| 34 | 11 | 7000.0 | 13625.0 |
+-------------+-------------+------------+------------+
{noformat}
Once I clean this code, I will start working on proper _order by_ handling.
was (Author: adeneche):
I rebased the code and added unit tests. The code still needs cleaning and proper commenting, but it already fixes DRILL-1487.
Here is an example of a query that returns correct results:
{noformat}
SELECT employee_id,position_id, salary, avg(salary) OVER (PARTITION BY position_id ORDER BY position_id) AS avg_sal FROM cp.`employee.json` ORDER BY position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id | salary | avg_sal |
+-------------+-------------+------------+------------+
| 1 | 1 | 80000.0 | 80000.0 |
| 5 | 2 | 35000.0 | 35833.333333333336 |
| 2 | 2 | 40000.0 | 35833.333333333336 |
| 4 | 2 | 40000.0 | 35833.333333333336 |
| 20 | 2 | 30000.0 | 35833.333333333336 |
| 21 | 2 | 35000.0 | 35833.333333333336 |
| 22 | 2 | 35000.0 | 35833.333333333336 |
| 6 | 3 | 25000.0 | 25000.0 |
| 7 | 4 | 15000.0 | 15000.0 |
| 10 | 5 | 50000.0 | 50000.0 |
| 40 | 6 | 10000.0 | 8675.0 |
| 39 | 6 | 10000.0 | 8675.0 |
| 37 | 6 | 6700.0 | 8675.0 |
| 38 | 6 | 8000.0 | 8675.0 |
| 42 | 7 | 5000.0 | 19500.0 |
| 36 | 7 | 45000.0 | 19500.0 |
| 41 | 7 | 8500.0 | 19500.0 |
| 43 | 8 | 6700.0 | 5850.0 |
| 44 | 8 | 5000.0 | 5850.0 |
| 47 | 9 | 6500.0 | 5887.5 |
| 48 | 9 | 7200.0 | 5887.5 |
| 49 | 9 | 5000.0 | 5887.5 |
| 50 | 9 | 5000.0 | 5887.5 |
| 51 | 9 | 5000.0 | 5887.5 |
| 52 | 9 | 5000.0 | 5887.5 |
| 45 | 9 | 6800.0 | 5887.5 |
| 46 | 9 | 6600.0 | 5887.5 |
| 9 | 11 | 17000.0 | 13625.0 |
| 13 | 11 | 15000.0 | 13625.0 |
| 14 | 11 | 17000.0 | 13625.0 |
| 15 | 11 | 12000.0 | 13625.0 |
| 11 | 11 | 15000.0 | 13625.0 |
| 8 | 11 | 10000.0 | 13625.0 |
| 28 | 11 | 14000.0 | 13625.0 |
| 29 | 11 | 8500.0 | 13625.0 |
| 30 | 11 | 15000.0 | 13625.0 |
| 16 | 11 | 17000.0 | 13625.0 |
| 17 | 11 | 10000.0 | 13625.0 |
| 18 | 11 | 17000.0 | 13625.0 |
| 31 | 11 | 14000.0 | 13625.0 |
| 19 | 11 | 10000.0 | 13625.0 |
| 32 | 11 | 17000.0 | 13625.0 |
| 23 | 11 | 15000.0 | 13625.0 |
| 24 | 11 | 17000.0 | 13625.0 |
| 25 | 11 | 12000.0 | 13625.0 |
| 26 | 11 | 15000.0 | 13625.0 |
| 27 | 11 | 16000.0 | 13625.0 |
| 12 | 11 | 8500.0 | 13625.0 |
| 33 | 11 | 17000.0 | 13625.0 |
| 34 | 11 | 7000.0 | 13625.0 |
+-------------+-------------+------------+------------+
{noformat}
> new window function implementation
> ----------------------------------
>
> Key: DRILL-1908
> URL: https://issues.apache.org/jira/browse/DRILL-1908
> Project: Apache Drill
> Issue Type: Improvement
> Components: Execution - Operators
> Reporter: Deneche A. Hakim
> Assignee: Deneche A. Hakim
> Priority: Critical
> Fix For: Future
>
>
> In order to fix DRILL-1487 a complete rewrite of the StreamingWindowFrameRecordBatch may be needed. The purpose of this issue is to report my progress and share my thoughts with the community in order to get a proper implementation
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)