You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Deneche A. Hakim (JIRA)" <ji...@apache.org> on 2015/01/06 18:00:37 UTC

[jira] [Comment Edited] (DRILL-1908) new window function implementation

    [ https://issues.apache.org/jira/browse/DRILL-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14266386#comment-14266386 ] 

Deneche A. Hakim edited comment on DRILL-1908 at 1/6/15 4:59 PM:
-----------------------------------------------------------------

I rebased the code and added unit tests. The code still needs cleaning and proper commenting, but it already fixes DRILL-1487 *partially* (it still doesn't handle _order by_ clause if it's different from the _partition by_ clause).
Here is an example of a query that returns correct results:
{noformat}
SELECT employee_id,position_id, salary, avg(salary) OVER (PARTITION BY position_id ORDER BY position_id) AS avg_sal FROM cp.`employee.json` ORDER BY position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id |   salary   |  avg_sal   |
+-------------+-------------+------------+------------+
| 1           | 1           | 80000.0    | 80000.0    |
| 5           | 2           | 35000.0    | 35833.333333333336 |
| 2           | 2           | 40000.0    | 35833.333333333336 |
| 4           | 2           | 40000.0    | 35833.333333333336 |
| 20          | 2           | 30000.0    | 35833.333333333336 |
| 21          | 2           | 35000.0    | 35833.333333333336 |
| 22          | 2           | 35000.0    | 35833.333333333336 |
| 6           | 3           | 25000.0    | 25000.0    |
| 7           | 4           | 15000.0    | 15000.0    |
| 10          | 5           | 50000.0    | 50000.0    |
| 40          | 6           | 10000.0    | 8675.0     |
| 39          | 6           | 10000.0    | 8675.0     |
| 37          | 6           | 6700.0     | 8675.0     |
| 38          | 6           | 8000.0     | 8675.0     |
| 42          | 7           | 5000.0     | 19500.0    |
| 36          | 7           | 45000.0    | 19500.0    |
| 41          | 7           | 8500.0     | 19500.0    |
| 43          | 8           | 6700.0     | 5850.0     |
| 44          | 8           | 5000.0     | 5850.0     |
| 47          | 9           | 6500.0     | 5887.5     |
| 48          | 9           | 7200.0     | 5887.5     |
| 49          | 9           | 5000.0     | 5887.5     |
| 50          | 9           | 5000.0     | 5887.5     |
| 51          | 9           | 5000.0     | 5887.5     |
| 52          | 9           | 5000.0     | 5887.5     |
| 45          | 9           | 6800.0     | 5887.5     |
| 46          | 9           | 6600.0     | 5887.5     |
| 9           | 11          | 17000.0    | 13625.0    |
| 13          | 11          | 15000.0    | 13625.0    |
| 14          | 11          | 17000.0    | 13625.0    |
| 15          | 11          | 12000.0    | 13625.0    |
| 11          | 11          | 15000.0    | 13625.0    |
| 8           | 11          | 10000.0    | 13625.0    |
| 28          | 11          | 14000.0    | 13625.0    |
| 29          | 11          | 8500.0     | 13625.0    |
| 30          | 11          | 15000.0    | 13625.0    |
| 16          | 11          | 17000.0    | 13625.0    |
| 17          | 11          | 10000.0    | 13625.0    |
| 18          | 11          | 17000.0    | 13625.0    |
| 31          | 11          | 14000.0    | 13625.0    |
| 19          | 11          | 10000.0    | 13625.0    |
| 32          | 11          | 17000.0    | 13625.0    |
| 23          | 11          | 15000.0    | 13625.0    |
| 24          | 11          | 17000.0    | 13625.0    |
| 25          | 11          | 12000.0    | 13625.0    |
| 26          | 11          | 15000.0    | 13625.0    |
| 27          | 11          | 16000.0    | 13625.0    |
| 12          | 11          | 8500.0     | 13625.0    |
| 33          | 11          | 17000.0    | 13625.0    |
| 34          | 11          | 7000.0     | 13625.0    |
+-------------+-------------+------------+------------+
{noformat}

Once I clean this code, I will start working on proper _order by_ handling.


was (Author: adeneche):
I rebased the code and added unit tests. The code still needs cleaning and proper commenting, but it already fixes DRILL-1487.
Here is an example of a query that returns correct results:
{noformat}
SELECT employee_id,position_id, salary, avg(salary) OVER (PARTITION BY position_id ORDER BY position_id) AS avg_sal FROM cp.`employee.json` ORDER BY position_id LIMIT 50;
+-------------+-------------+------------+------------+
| employee_id | position_id |   salary   |  avg_sal   |
+-------------+-------------+------------+------------+
| 1           | 1           | 80000.0    | 80000.0    |
| 5           | 2           | 35000.0    | 35833.333333333336 |
| 2           | 2           | 40000.0    | 35833.333333333336 |
| 4           | 2           | 40000.0    | 35833.333333333336 |
| 20          | 2           | 30000.0    | 35833.333333333336 |
| 21          | 2           | 35000.0    | 35833.333333333336 |
| 22          | 2           | 35000.0    | 35833.333333333336 |
| 6           | 3           | 25000.0    | 25000.0    |
| 7           | 4           | 15000.0    | 15000.0    |
| 10          | 5           | 50000.0    | 50000.0    |
| 40          | 6           | 10000.0    | 8675.0     |
| 39          | 6           | 10000.0    | 8675.0     |
| 37          | 6           | 6700.0     | 8675.0     |
| 38          | 6           | 8000.0     | 8675.0     |
| 42          | 7           | 5000.0     | 19500.0    |
| 36          | 7           | 45000.0    | 19500.0    |
| 41          | 7           | 8500.0     | 19500.0    |
| 43          | 8           | 6700.0     | 5850.0     |
| 44          | 8           | 5000.0     | 5850.0     |
| 47          | 9           | 6500.0     | 5887.5     |
| 48          | 9           | 7200.0     | 5887.5     |
| 49          | 9           | 5000.0     | 5887.5     |
| 50          | 9           | 5000.0     | 5887.5     |
| 51          | 9           | 5000.0     | 5887.5     |
| 52          | 9           | 5000.0     | 5887.5     |
| 45          | 9           | 6800.0     | 5887.5     |
| 46          | 9           | 6600.0     | 5887.5     |
| 9           | 11          | 17000.0    | 13625.0    |
| 13          | 11          | 15000.0    | 13625.0    |
| 14          | 11          | 17000.0    | 13625.0    |
| 15          | 11          | 12000.0    | 13625.0    |
| 11          | 11          | 15000.0    | 13625.0    |
| 8           | 11          | 10000.0    | 13625.0    |
| 28          | 11          | 14000.0    | 13625.0    |
| 29          | 11          | 8500.0     | 13625.0    |
| 30          | 11          | 15000.0    | 13625.0    |
| 16          | 11          | 17000.0    | 13625.0    |
| 17          | 11          | 10000.0    | 13625.0    |
| 18          | 11          | 17000.0    | 13625.0    |
| 31          | 11          | 14000.0    | 13625.0    |
| 19          | 11          | 10000.0    | 13625.0    |
| 32          | 11          | 17000.0    | 13625.0    |
| 23          | 11          | 15000.0    | 13625.0    |
| 24          | 11          | 17000.0    | 13625.0    |
| 25          | 11          | 12000.0    | 13625.0    |
| 26          | 11          | 15000.0    | 13625.0    |
| 27          | 11          | 16000.0    | 13625.0    |
| 12          | 11          | 8500.0     | 13625.0    |
| 33          | 11          | 17000.0    | 13625.0    |
| 34          | 11          | 7000.0     | 13625.0    |
+-------------+-------------+------------+------------+
{noformat}

> new window function implementation
> ----------------------------------
>
>                 Key: DRILL-1908
>                 URL: https://issues.apache.org/jira/browse/DRILL-1908
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Operators
>            Reporter: Deneche A. Hakim
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>             Fix For: Future
>
>
> In order to fix DRILL-1487 a complete rewrite of the StreamingWindowFrameRecordBatch may be needed. The purpose of this issue is to report my progress and share my thoughts with the community in order to get a proper implementation



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)