You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2019/01/15 07:12:00 UTC
[jira] [Commented] (DRILL-6146) UNION with empty input on any one side returns incorrect results

    [ https://issues.apache.org/jira/browse/DRILL-6146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742818#comment-16742818 ] 

Khurram Faraaz commented on DRILL-6146:
---------------------------------------

The issue can be reproduced on Drill 1.14.0.

When LHS has an empty table ( in this case the directory empty_dir has an empty file f1.json )

Whereas, when we flip the LHS and RHS in the failing query, it returns a result, with one record, having the value 1
{noformat}
[test@test102-45 ~]# hadoop fs -ls /tmp/empty_dir
Found 1 items
-rwxr-xr-x 3 test test 0 2019-01-15 06:59 /tmp/empty_dir/f1.json

failing query => select id from dfs.tmp.`empty_dir` union select * from (values(1));

00-00    Screen : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {8.1 rows, 20.1 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1216
00-01      Project(id=[$0]) : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {8.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1215
00-02        StreamAgg(group=[{0}]) : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {7.0 rows, 19.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1214
00-03          Sort(sort0=[$0], dir0=[ASC]) : rowType = RecordType(ANY id): rowcount = 2.0, cumulative cost = {5.0 rows, 11.0 cpu, 0.0 io, 0.0 network, 16.0 memory}, id = 1213
00-04            UnionAll(all=[true]) : rowType = RecordType(ANY id): rowcount = 2.0, cumulative cost = {3.0 rows, 3.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1212
00-06              Scan(table=[[dfs, tmp, empty_dir]], groupscan=[EasyGroupScan [selectionRoot=maprfs:/tmp/empty_dir, numFiles=1, columns=[`id`], files=[maprfs:///tmp/empty_dir/f1.json]]]) : rowType = RecordType(ANY id): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1210
00-05              Values(tuples=[[{ 1 }]]) : rowType = RecordType(INTEGER EXPR$0): rowcount = 1.0, cumulative cost = {1.0 rows, 1.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 1211{noformat}

> UNION with empty input on any one side returns incorrect results
> ----------------------------------------------------------------
>
>                 Key: DRILL-6146
>                 URL: https://issues.apache.org/jira/browse/DRILL-6146
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning &amp; Optimization
>    Affects Versions: 1.12.0
>            Reporter: Khurram Faraaz
>            Assignee: Vitalii Diravka
>            Priority: Major
>
> When any one side of the UNION has an empty file as input, Drill returns incorrect results.
>  
> table t3 does not have any data inserted into its rows. Postgress returns 1 as the result for both the queries, whereas Drill does not.
>  
> {noformat}
> postgres=# create table t3(id int, name varchar(25));
> CREATE TABLE 
> postgres=# select * from (values(1)) t union select id from t3;
>        1
>  
> postgres=# select id from t3 union select * from (values(1)) t;
>   1
>  {noformat}
>  
>  
> Results from Drill 1.12.0-mapr, note we return result 1 as result after the union.
> We have a directory named empty_JSON_f , and it has a single empty JSON file (that JSON file has no content in it, it is empty).
>  
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select * from (values(1)) UNION select id from empty_JSON_f;
> +---------+
> | EXPR$0  |
> +---------+
> | 1       |
> +---------+
> 1 row selected (2.272 seconds){noformat}
> However, in this query we return null and loose the value 1 from the right hand side, after the union, this doesn't seem correct 
> {noformat}
> 0: jdbc:drill:schema=dfs.tmp> select id from empty_JSON_f UNION select * from (values(1));
> +-------+
> |  id   |
> +-------+
> | null  |
> +-------+
> 1 row selected (0.33 seconds){noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)