You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2018/02/07 01:09:00 UTC
[jira] [Created] (DRILL-6141) JOIN query that uses USING clause returns incorrect results

Khurram Faraaz created DRILL-6141:
-------------------------------------

             Summary: JOIN query that uses USING clause returns incorrect results
                 Key: DRILL-6141
                 URL: https://issues.apache.org/jira/browse/DRILL-6141
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning &amp; Optimization
    Affects Versions: 1.12.0
            Reporter: Khurram Faraaz


Join query that uses USING clause returns incorrect results.

Postgres 9.2.23 returns only one occurrence of the "id" column

{noformat}
postgres=# create table t1(id int, name varchar(30));
CREATE TABLE
postgres=# create table t2(id int, name varchar(30));
CREATE TABLE

postgres=# select * from t1;
 id | name
----+-------
 10 | John
 13 | Kevin
 15 | Susan
(3 rows)

postgres=# select * from t2;
 id | name
----+-------
 19 | Kyle
 13 | Kevin
 1 | Bob
 17 | Kumar
(4 rows)

postgres=# select * from t1 JOIN t2 USING(id);
 id | name | name
----+-------+-------
 13 | Kevin | Kevin
(1 row)

{noformat}

results from Drill 1.12.0-mapr commit : 2de42491be795721bcb4059bd46e27fc33272309

{noformat}


0: jdbc:drill:schema=dfs.tmp> create table t1 as select cast(columns[0] as int) c1, cast(columns[1] as varchar(30)) c2 from `t1.csv`;
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 3 |
+-----------+----------------------------+
1 row selected (0.213 seconds)
0: jdbc:drill:schema=dfs.tmp> create table t2 as select cast(columns[0] as int) c1, cast(columns[1] as varchar(30)) c2 from `t2.csv`;
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 4 |
+-----------+----------------------------+
1 row selected (0.168 seconds)

0: jdbc:drill:schema=dfs.tmp> select * from t1;
+-----+--------+
| c1 | c2 |
+-----+--------+
| 10 | John |
| 13 | Kevin |
| 15 | Susan |
+-----+--------+
3 rows selected (0.15 seconds)
0: jdbc:drill:schema=dfs.tmp> select * from t2;
+-----+--------+
| c1 | c2 |
+-----+--------+
| 19 | Kyle |
| 13 | Kevin |
| 1 | Bob |
| 17 | Kumar |
+-----+--------+
4 rows selected (0.171 seconds)

## Note that Drill returns an extra column, unlike Postgres, for the same query over same data

0: jdbc:drill:schema=dfs.tmp> select * from t1 JOIN t2 USING(c1);
+-----+--------+------+--------+
| c1 | c2 | c10 | c20 |
+-----+--------+------+--------+
| 13 | Kevin | 13 | Kevin |
+-----+--------+------+--------+
1 row selected (0.256 seconds)

## explain plan for above query

0: jdbc:drill:schema=dfs.tmp> explain plan for select * from t1 JOIN t2 USING(c1);
+------+------+
| text | json |
+------+------+
| 00-00 Screen
00-01 ProjectAllowDup(*=[$0], *0=[$1])
00-02 Project(T49¦¦*=[$0], T48¦¦*=[$2])
00-03 Project(T49¦¦*=[$2], c10=[$3], T48¦¦*=[$0], c1=[$1])
00-04 HashJoin(condition=[=($3, $1)], joinType=[inner])
00-06 Project(T48¦¦*=[$0], c1=[$1])
00-08 Scan(table=[[dfs, tmp, t2]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t2]], selectionRoot=maprfs:/tmp/t2, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`*`]]])
00-05 Project(T49¦¦*=[$0], c10=[$1])
00-07 Project(T49¦¦*=[$0], c1=[$1])
00-09 Scan(table=[[dfs, tmp, t1]], groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/t1]], selectionRoot=maprfs:/tmp/t1, numFiles=1, numRowGroups=1, usedMetadataFile=false, columns=[`*`]]])

{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)