You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2019/01/15 07:12:00 UTC
[jira] [Commented] (DRILL-6146) UNION with empty input on any one
side returns incorrect results
[ https://issues.apache.org/jira/browse/DRILL-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742818#comment-16742818 ]
Khurram Faraaz commented on DRILL-6146:
---------------------------------------
The issue can be reproduced on Drill 1.14.0.
When LHS has an empty table ( in this case the directory empty_dir has an empty file f1.json )
Whereas, when we flip the LHS and RHS in the failing query, it returns a result, with one record, having the value 1
{noformat}
[test@test102-45 ~]# hadoop fs -ls /tmp/empty_dir
Found 1 items
-rwxr-xr-x 3 test test 0 2019-01-15 06:59 /tmp/empty_dir/f1.json
failing query => select id from dfs.tmp.`empty_dir` union select * from (values(1));
00-00 Screen : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {8.1 rows, 20.1 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1216
00-01 Project(id=[$0]) : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {8.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1215
00-02 StreamAgg(group=[{0}]) : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {7.0 rows, 19.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1214
00-03 Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY id): rowcount = 2.0, cumulative cost = {5.0 rows, 11.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1213
00-04 UnionAll(all=[true]) : rowType = RecordType(ANY id): rowcount = 2.0, cumulative cost = {3.0 rows, 3.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1212
00-06 Scan(table=[[dfs, tmp, empty_dir]], groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/empty_dir, numFiles=1, columns=[`id`], files=[maprfs:///tmp/empty_dir/f1.json]]]) : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1210
00-05 Values(tuples=[[{ 1 }]]) : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1211{noformat}
> UNION with empty input on any one side returns incorrect results
> ----------------------------------------------------------------
>
> Key: DRILL-6146
> URL: https://issues.apache.org/jira/browse/DRILL-6146
> Project: Apache Drill
> Issue Type: Bug
> Components: Query Planning & Optimization
> Affects Versions: 1.12.0
> Reporter: Khurram Faraaz
> Assignee: Vitalii Diravka
> Priority: Major
>
> When any one side of the UNION has an empty file as input, Drill returns incorrect results.
>
> table t3 does not have any data inserted into its rows. Postgress returns 1 as the result for both the queries, whereas Drill does not.
>
> {noformat}
> postgres=# create table t3(id int, name varchar(25));
> CREATE TABLE
> postgres=# select * from (values(1)) t union select id from t3;
> 1
>
> postgres=# select id from t3 union select * from (values(1)) t;
> 1
> {noformat}
>
>
> Results from Drill 1.12.0-mapr, note we return result 1 as result after the union.
> We have a directory named empty_JSON_f , and it has a single empty JSON file (that JSON file has no content in it, it is empty).
>
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from (values(1)) UNION select id from empty_JSON_f;
> +---------+
> | EXPR$0 |
> +---------+
> | 1 |
> +---------+
> 1 row selected (2.272 seconds){noformat}
> However, in this query we return null and loose the value 1 from the right hand side, after the union, this doesn't seem correct
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select id from empty_JSON_f UNION select * from (values(1));
> +-------+
> | id |
> +-------+
> | null |
> +-------+
> 1 row selected (0.33 seconds){noformat}
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)