You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/05/26 20:53:17 UTC

[jira] [Created] (DRILL-3182) Window function with DISTINCT qualifier returns seemingly incorrect result

Victoria Markman created DRILL-3182:
---------------------------------------

             Summary: Window function with DISTINCT qualifier returns seemingly incorrect result
                 Key: DRILL-3182
                 URL: https://issues.apache.org/jira/browse/DRILL-3182
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.0.0
            Reporter: Victoria Markman
            Assignee: Daniel Barclay (Drill)


Both count(distinct <colname>) and count(all <colname>) return the same result. It does not look correct to me and I'm not sure what the correct behavior is going to be.

(1) Latest postgres does not support distinct with Window functions:

postgres=# select a2, count(distinct b2) over(partition by a2) from t2;
ERROR:  DISTINCT is not implemented for window functions
LINE 1: select a2, count(distinct b2) over(partition by a2) from t2;
                   ^
postgres=# select a2, avg(distinct a2) over(partition by a2) from t2;
ERROR:  DISTINCT is not implemented for window functions
LINE 1: select a2, avg(distinct a2) over(partition by a2) from t2;

(2) Calcite does not support this either:  https://github.com/apache/incubator-calcite/blob/master/doc/reference.md

Do we support it ? If not,  I think we should throw an error ...

{code}
0: jdbc:drill:schema=dfs> select * from t2;
+-----+--------+-------------+
| a2  |   b2   |     c2      |
+-----+--------+-------------+
| 0   | zzz    | 2014-12-31  |
| 1   | aaaaa  | 2015-01-01  |
| 2   | bbbbb  | 2015-01-02  |
| 2   | bbbbb  | 2015-01-02  |
| 2   | bbbbb  | 2015-01-02  |
| 3   | ccccc  | 2015-01-03  |
| 4   | ddddd  | 2015-01-04  |
| 5   | eeeee  | 2015-01-05  |
| 6   | fffff  | 2015-01-06  |
| 7   | ggggg  | 2015-01-07  |
| 7   | ggggg  | 2015-01-07  |
| 8   | hhhhh  | 2015-01-08  |
| 9   | iiiii  | 2015-01-09  |
+-----+--------+-------------+
13 rows selected (0.134 seconds)

0: jdbc:drill:schema=dfs> select a2, count(distinct b2) over(partition by a2) from t2;
+-----+---------+
| a2  | EXPR$1  |
+-----+---------+
| 0   | 1       |
| 1   | 1       |
| 2   | 3       |
| 2   | 3       |
| 2   | 3       |
| 3   | 1       |
| 4   | 1       |
| 5   | 1       |
| 6   | 1       |
| 7   | 2       |
| 7   | 2       |
| 8   | 1       |
| 9   | 1       |
+-----+---------+
13 rows selected (0.224 seconds)

0: jdbc:drill:schema=dfs> select a2, count(b2) over(partition by a2) from t2;
+-----+---------+
| a2  | EXPR$1  |
+-----+---------+
| 0   | 1       |
| 1   | 1       |
| 2   | 3       |
| 2   | 3       |
| 2   | 3       |
| 3   | 1       |
| 4   | 1       |
| 5   | 1       |
| 6   | 1       |
| 7   | 2       |
| 7   | 2       |
| 8   | 1       |
| 9   | 1       |
+-----+---------+
13 rows selected (0.219 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)