You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2015/05/14 20:37:00 UTC
[jira] [Created] (DRILL-3087) Union All returns incorrect results.
Khurram Faraaz created DRILL-3087:
-------------------------------------
Summary: Union All returns incorrect results.
Key: DRILL-3087
URL: https://issues.apache.org/jira/browse/DRILL-3087
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.0.0
Environment: 21cc578b6b8c8f3ca1ebffd3dbb92e35d68bc726
Reporter: Khurram Faraaz
Assignee: Jinfeng Ni
Priority: Critical
Union All returns incorrect results when the same aggregate + group by query is used on both sides of Union All. Test was run from sqlline on 4 node cluster on CentOS.
Total number of records in the parquet file are,
{code}
0: jdbc:drill:schema=dfs.tmp> select count(*) from `twokeyParq_02/0_0_0.parquet`;
+------------+
| EXPR$0 |
+------------+
| 26212355 |
+------------+
1 row selected (0.17 seconds)
{code}
Results returned by aggregate + group by query.
{code}
0: jdbc:drill:schema=dfs.tmp> select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2;
+------------+------------+
| max_key1 | key2 |
+------------+------------+
| 1.42931543226E9 | m |
| 1.42931281008E9 | e |
| 1.42931334853E9 | k |
| 1.42931595791E9 | n |
| 1.42435032101E9 | 1 |
| 1.42931539809E9 | l |
| 1.42931336919E9 | c |
| 1.40095133379E9 | 0 |
| 1.42931553374E9 | h |
| 1.42931539751E9 | j |
| 1.42931480081E9 | f |
| 1.42931347924E9 | b |
| 1.42931509068E9 | g |
| 1.42931380603E9 | d |
| 1.42931626355E9 | a |
| 1.42931450347E9 | i |
+------------+------------+
16 rows selected (19.262 seconds)
{code}
There Union All results are incorrect.
{code}
0: jdbc:drill:schema=dfs.tmp> select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
+------------+------------+
| max_key1 | key2 |
+------------+------------+
| 1.42931380603E9 | d |
| 1.42931336919E9 | c |
| 1.42931347924E9 | b |
| 1.42435032101E9 | 1 |
| 1.42931626355E9 | a |
| 1.40095133379E9 | 0 |
| 1.42931334853E9 | k |
| 1.42931543226E9 | m |
| 1.42931539751E9 | j |
| 1.42931553374E9 | h |
| 1.42931281008E9 | e |
| 1.42931595791E9 | n |
| 1.42931509068E9 | g |
| 1.42931480081E9 | f |
| 1.42931539809E9 | l |
| 1.42931450347E9 | i |
| 1870700.0 | d |
| 1871853.0 | c |
| 1870682.0 | b |
| 180.0 | 1 |
| 1872286.0 | a |
| 156.0 | 0 |
| 1873931.0 | k |
| 1874177.0 | m |
| 1872252.0 | j |
| 1872734.0 | h |
| 1871510.0 | e |
| 1874465.0 | n |
| 1873607.0 | g |
| 1870453.0 | f |
| 1870933.0 | l |
| 1872436.0 | i |
+------------+------------+
32 rows selected (39.183 seconds)
{code}
Physical query plan
{code}
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from (select max(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2 union all select count(cast(key1 as double)) max_key1, key2 from `twokeyParq_02/0_0_0.parquet` group by key2);
+------------+------------+
| text | json |
+------------+------------+
| 00-00 Screen
00-01 UnionAll(all=[true])
00-03 Project(max_key1=[$1], key2=[$0])
00-05 HashAgg(group=[{0}], max_key1=[MAX($1)])
00-07 HashAgg(group=[{0}], max_key1=[MAX($1)])
00-09 Project(key2=[$0], $f1=[CAST($1):DOUBLE])
00-11 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
00-02 Project(max_key1=[$1], key2=[$0])
00-04 HashAgg(group=[{0}], max_key1=[$SUM0($1)])
00-06 HashAgg(group=[{0}], max_key1=[COUNT($1)])
00-08 Project(key2=[$0], $f1=[CAST($1):DOUBLE])
00-10 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/tmp/twokeyParq_02/0_0_0.parquet]], selectionRoot=/tmp/twokeyParq_02/0_0_0.parquet, numFiles=1, columns=[`key2`, `key1`]]])
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)