You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2015/05/14 20:40:00 UTC

[jira] [Resolved] (DRILL-3087) Union All returns incorrect results.

     [ https://issues.apache.org/jira/browse/DRILL-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Khurram Faraaz resolved DRILL-3087.
-----------------------------------
    Resolution: Invalid

the two queries are different on RHS and LHS of Union All.

> Union All returns incorrect results.
> ------------------------------------
>
>                 Key: DRILL-3087
>                 URL: https://issues.apache.org/jira/browse/DRILL-3087
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.0.0
>         Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726 
>            Reporter: Khurram Faraaz
>            Assignee: Jinfeng Ni
>            Priority: Critical
>
> Union All returns incorrect results when the same aggregate + group by query is used on both sides of Union All. Test was run from sqlline on 4 node cluster on CentOS.
> Total number of records in the parquet file are,
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select count(*) from `twokeyParq_02/0_0_0.parquet`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 26212355   |
> +------------+
> 1 row selected (0.17 seconds)
> {code}
> Results returned by aggregate + group by query.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2;
> +------------+------------+
> |  max_key1  |    key2    |
> +------------+------------+
> | 1.42931543226E9 | m          |
> | 1.42931281008E9 | e          |
> | 1.42931334853E9 | k          |
> | 1.42931595791E9 | n          |
> | 1.42435032101E9 | 1          |
> | 1.42931539809E9 | l          |
> | 1.42931336919E9 | c          |
> | 1.40095133379E9 | 0          |
> | 1.42931553374E9 | h          |
> | 1.42931539751E9 | j          |
> | 1.42931480081E9 | f          |
> | 1.42931347924E9 | b          |
> | 1.42931509068E9 | g          |
> | 1.42931380603E9 | d          |
> | 1.42931626355E9 | a          |
> | 1.42931450347E9 | i          |
> +------------+------------+
> 16 rows selected (19.262 seconds)
> {code}
> There Union All results are incorrect.
> {code}
> 0: jdbc:drill:schema=dfs.tmp> select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
> +------------+------------+
> |  max_key1  |    key2    |
> +------------+------------+
> | 1.42931380603E9 | d          |
> | 1.42931336919E9 | c          |
> | 1.42931347924E9 | b          |
> | 1.42435032101E9 | 1          |
> | 1.42931626355E9 | a          |
> | 1.40095133379E9 | 0          |
> | 1.42931334853E9 | k          |
> | 1.42931543226E9 | m          |
> | 1.42931539751E9 | j          |
> | 1.42931553374E9 | h          |
> | 1.42931281008E9 | e          |
> | 1.42931595791E9 | n          |
> | 1.42931509068E9 | g          |
> | 1.42931480081E9 | f          |
> | 1.42931539809E9 | l          |
> | 1.42931450347E9 | i          |
> | 1870700.0  | d          |
> | 1871853.0  | c          |
> | 1870682.0  | b          |
> | 180.0      | 1          |
> | 1872286.0  | a          |
> | 156.0      | 0          |
> | 1873931.0  | k          |
> | 1874177.0  | m          |
> | 1872252.0  | j          |
> | 1872734.0  | h          |
> | 1871510.0  | e          |
> | 1874465.0  | n          |
> | 1873607.0  | g          |
> | 1870453.0  | f          |
> | 1870933.0  | l          |
> | 1872436.0  | i          |
> +------------+------------+
> 32 rows selected (39.183 seconds)
> {code}
> Physical query plan
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      UnionAll(all=[true])
> 00-03        Project(max_key1=[$1], key2=[$0])
> 00-05          HashAgg(group=[{0}], max_key1=[MAX($1)])
> 00-07            HashAgg(group=[{0}], max_key1=[MAX($1)])
> 00-09              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
> 00-11                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
> 00-02        Project(max_key1=[$1], key2=[$0])
> 00-04          HashAgg(group=[{0}], max_key1=[$SUM0($1)])
> 00-06            HashAgg(group=[{0}], max_key1=[COUNT($1)])
> 00-08              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
> 00-10                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)