You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2015/05/14 20:37:00 UTC

[jira] [Created] (DRILL-3087) Union All returns incorrect results.

Khurram Faraaz created DRILL-3087:
-------------------------------------

             Summary: Union All returns incorrect results.
                 Key: DRILL-3087
                 URL: https://issues.apache.org/jira/browse/DRILL-3087
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.0.0
         Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726 
            Reporter: Khurram Faraaz
            Assignee: Jinfeng Ni
            Priority: Critical


Union All returns incorrect results when the same aggregate + group by query is used on both sides of Union All. Test was run from sqlline on 4 node cluster on CentOS.

Total number of records in the parquet file are,

{code}
0: jdbc:drill:schema=dfs.tmp> select count(*) from `twokeyParq_02/0_0_0.parquet`;
+------------+
|   EXPR$0   |
+------------+
| 26212355   |
+------------+
1 row selected (0.17 seconds)
{code}

Results returned by aggregate + group by query.

{code}
0: jdbc:drill:schema=dfs.tmp> select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2;
+------------+------------+
|  max_key1  |    key2    |
+------------+------------+
| 1.42931543226E9 | m          |
| 1.42931281008E9 | e          |
| 1.42931334853E9 | k          |
| 1.42931595791E9 | n          |
| 1.42435032101E9 | 1          |
| 1.42931539809E9 | l          |
| 1.42931336919E9 | c          |
| 1.40095133379E9 | 0          |
| 1.42931553374E9 | h          |
| 1.42931539751E9 | j          |
| 1.42931480081E9 | f          |
| 1.42931347924E9 | b          |
| 1.42931509068E9 | g          |
| 1.42931380603E9 | d          |
| 1.42931626355E9 | a          |
| 1.42931450347E9 | i          |
+------------+------------+
16 rows selected (19.262 seconds)
{code}

There Union All results are incorrect.

{code}
0: jdbc:drill:schema=dfs.tmp> select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
+------------+------------+
|  max_key1  |    key2    |
+------------+------------+
| 1.42931380603E9 | d          |
| 1.42931336919E9 | c          |
| 1.42931347924E9 | b          |
| 1.42435032101E9 | 1          |
| 1.42931626355E9 | a          |
| 1.40095133379E9 | 0          |
| 1.42931334853E9 | k          |
| 1.42931543226E9 | m          |
| 1.42931539751E9 | j          |
| 1.42931553374E9 | h          |
| 1.42931281008E9 | e          |
| 1.42931595791E9 | n          |
| 1.42931509068E9 | g          |
| 1.42931480081E9 | f          |
| 1.42931539809E9 | l          |
| 1.42931450347E9 | i          |
| 1870700.0  | d          |
| 1871853.0  | c          |
| 1870682.0  | b          |
| 180.0      | 1          |
| 1872286.0  | a          |
| 156.0      | 0          |
| 1873931.0  | k          |
| 1874177.0  | m          |
| 1872252.0  | j          |
| 1872734.0  | h          |
| 1871510.0  | e          |
| 1874465.0  | n          |
| 1873607.0  | g          |
| 1870453.0  | f          |
| 1870933.0  | l          |
| 1872436.0  | i          |
+------------+------------+
32 rows selected (39.183 seconds)
{code}

Physical query plan

{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionAll(all=[true])
00-03        Project(max_key1=[$1], key2=[$0])
00-05          HashAgg(group=[{0}], max_key1=[MAX($1)])
00-07            HashAgg(group=[{0}], max_key1=[MAX($1)])
00-09              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
00-11                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
00-02        Project(max_key1=[$1], key2=[$0])
00-04          HashAgg(group=[{0}], max_key1=[$SUM0($1)])
00-06            HashAgg(group=[{0}], max_key1=[COUNT($1)])
00-08              Project(key2=[$0], $f1=[CAST($1):DOUBLE])
00-10                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)