You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2016/03/10 07:22:40 UTC
[jira] [Commented] (DRILL-4494) Window sum over integer column returns incorrect results.

    [ https://issues.apache.org/jira/browse/DRILL-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15188758#comment-15188758 ] 

Khurram Faraaz commented on DRILL-4494:
---------------------------------------

Query plan from Drill 1.6.0

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for  select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(w_sum=[$0])
00-02        Project(w0$o0=[$2])
00-03          Window(window#0=[window(partition {0} order by [0 DESC-nulls-first] range between UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [SUM($1)])])
00-04            SelectionVectorRemover
00-05              Sort(sort0=[$0], sort1=[$0], dir0=[ASC], dir1=[DESC-nulls-first])
00-06                Project(T5¦¦*=[$0], $1=[ITEM($0, 'c2')])
00-07                  SelectionVectorRemover
00-08                    Sort(sort0=[$1], sort1=[$2], dir0=[ASC], dir1=[DESC-nulls-last])
00-09                      Project(T5¦¦*=[$0], c1=[$1], c2=[$2])
00-10                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t_alltype]], selectionRoot=maprfs:/tmp/t_alltype, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
{noformat}

> Window sum over integer column returns incorrect results.
> ---------------------------------------------------------
>
>                 Key: DRILL-4494
>                 URL: https://issues.apache.org/jira/browse/DRILL-4494
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.6.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Assignee: Deneche A. Hakim
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.7.0
>
>         Attachments: 0_0_0.parquet, t_alltype.csv
>
>
> Window sum over integer column returns incorrect results.
> Drill 1.6.0, git commit ID : git.commit.id=64ab0a8e
> Note that Drill returns the same value for SUM(c2) in the below query.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
> +---------------+
> |     w_sum     |
> +---------------+
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> ...
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> | -16323034011  |
> +---------------+
> 145 rows selected (0.683 seconds)
> {noformat}
> Postgres 9.3
> {noformat}
> postgres=#   select SUM(c2) OVER w as w_sum from ( SELECT * FROM t_alltype ORDER BY c1,c2 DESC NULLS LAST ) subquery WINDOW w AS (PARTITION BY c8 ORDER BY c2 DESC NULLS FIRST RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
>     w_sum
> -------------
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  -8995559793
>  ...
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  -9169127412
>  ...
>   1841653194
>   1841653194
>   1841653194
>   1841653194
>   1841653194
> (145 rows)
> {noformat}
> Both in Drill and on Postgres there are 142 distinct rows and three nulls of the 145 total rows, in column c2.
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select count(distinct c2) from t_alltype;
> +---------+
> | EXPR$0  |
> +---------+
> | 142     |
> +---------+
> 1 row selected (0.682 seconds)
> 0: jdbc:drill:schema=dfs.tmp> select c2 from t_alltype where c2 is null;
> +-------+
> |  c2   |
> +-------+
> | null  |
> | null  |
> | null  |
> +-------+
> 3 rows selected (0.389 seconds)
> {noformat}
> {noformat}
> postgres=#  select count(distinct c2) from t_alltype;
>  count
> -------
>    142
> (1 row)
> postgres=# select c2 from t_alltype where c2 is null;
>  c2
> ----
> (3 rows)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)