You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2017/03/10 12:33:04 UTC

[jira] [Commented] (DRILL-5347) Join query returns incorrect results, returns an extra column.

    [ https://issues.apache.org/jira/browse/DRILL-5347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15904995#comment-15904995 ] 

Khurram Faraaz commented on DRILL-5347:
---------------------------------------

Query plan for the join query that returns an extra column.

{noformat}
0: jdbc:drill:schema=dfs.tmp> explain plan for select * from using_l1 JOIN using_r1 USING(c1);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      ProjectAllowDup(*=[$0], *0=[$1])
00-02        Project(T132¦¦*=[$0], T131¦¦*=[$2])
00-03          Project(T132¦¦*=[$2], c10=[$3], T131¦¦*=[$0], c1=[$1])
00-04            HashJoin(condition=[=($3, $1)], joinType=[inner])
00-05              Project(T132¦¦*=[$0], c10=[$1])
00-07                Project(T132¦¦*=[$0], c1=[$1])
00-09                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/using_l1]], selectionRoot=maprfs:/tmp/using_l1, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
00-06              Project(T131¦¦*=[$0], c1=[$1])
00-08                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:///tmp/using_r1]], selectionRoot=maprfs:/tmp/using_r1, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
{noformat}

> Join query returns incorrect results, returns an extra column.
> --------------------------------------------------------------
>
>                 Key: DRILL-5347
>                 URL: https://issues.apache.org/jira/browse/DRILL-5347
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.10.0
>            Reporter: Khurram Faraaz
>            Priority: Critical
>         Attachments: using_l1.txt, using_r1.txt
>
>
> Join query returns incorrect results, returns an extra column.
> Note that the columns in both the parquet files are in different order.
> Postgres returns correct results.
> {noformat}
> postgres=# create table using_l1(c1 int, c2 varchar(25), c3 date);
> CREATE TABLE
> postgres=# create table using_r1(c2 varchar(25), c3 date, c1 int);
> CREATE TABLE
> postgres=# copy using_l1 from '/root/using_clause/using_l1.txt' DELIMITER ',' CSV;
> COPY 17
> postgres=# copy using_r1 from '/root/using_clause/using_r1.txt' DELIMITER ',' CSV;
> COPY 18
> postgres=# select * from using_l1 JOIN using_r1 USING(c1);
>   c1   |    c2     |     c3     |     c2     |     c3
> -------+-----------+------------+------------+------------
>      1 | test      | 1982-09-11 | teststring | 2009-10-10
>      1 | test      | 1982-09-11 | yes        | 2001-09-09
>      2 | string    | 1989-04-12 | never      | 1989-11-11
>      3 | foo       | 2000-10-12 | none       | 1967-03-05
>      3 | foo       | 2000-10-12 | here       | 2011-09-09
>      4 | foobar    | 2015-01-07 | champ      | 2008-06-07
>      5 | bar       | 2004-05-08 | bar        | 2010-02-04
>     17 | barfoobar | 1992-12-06 | hello      | 2014-07-02
>     19 | hello     | 1980-10-10 | hi         | 2000-01-01
>      0 | now       | 2007-07-02 | foobar     | 2011-01-01
>     10 | here      | 2009-11-11 | done       | 1947-08-14
>  99999 | there     | 1993-09-03 | there      | 1993-09-03
>     22 | hi        | 2000-01-04 | there      | 2016-03-22
>     13 | no        | 2016-10-01 | foo        | 1982-10-11
>     13 | no        | 2016-10-01 | no         | 1992-12-06
> (15 rows)
> {noformat}
> Results from Drill 1.10.0 (3dfb4972), for the same input data we see an extra column c10 in the results of join query.
> {noformat}
> ## using_l1 => (c1 INTEGER, c2 VARCHAR(25), c3 DATE)
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE using_l1 AS SELECT CAST(columns[0] AS INTEGER) c1, CAST(columns[1] AS VARCHAR(25)) c2, CAST(columns[2] AS DATE) c3 FROM `using_l1.txt`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 17                         |
> +-----------+----------------------------+
> 1 row selected (0.258 seconds)
> ## using_r1 => (c2 VARCHAR, c3 DATE, c1 INTEGER)
> 0: jdbc:drill:schema=dfs.tmp> CREATE TABLE using_r1 AS SELECT CAST(columns[0] AS VARCHAR(25)) c2, CAST(columns[1] AS DATE) c3, CAST(columns[2] AS INTEGER) c1 FROM `using_r1.txt`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 18                         |
> +-----------+----------------------------+
> 1 row selected (0.249 seconds)
> 0: jdbc:drill:schema=dfs.tmp>
> ## Last column c10 is un-expected in the results of the JOIN
> 0: jdbc:drill:schema=dfs.tmp> select * from using_l1 JOIN using_r1 USING(c1);
> +--------+------------+-------------+-------------+-------------+--------+
> |   c1   |     c2     |     c3      |     c20     |     c30     |  c10   |
> +--------+------------+-------------+-------------+-------------+--------+
> | 1      | test       | 1982-09-11  | yes         | 2001-09-09  | 1      |
> | 1      | test       | 1982-09-11  | teststring  | 2009-10-10  | 1      |
> | 2      | string     | 1989-04-12  | never       | 1989-11-11  | 2      |
> | 3      | foo        | 2000-10-12  | here        | 2011-09-09  | 3      |
> | 3      | foo        | 2000-10-12  | none        | 1967-03-05  | 3      |
> | 4      | foobar     | 2015-01-07  | champ       | 2008-06-07  | 4      |
> | 5      | bar        | 2004-05-08  | bar         | 2010-02-04  | 5      |
> | 17     | barfoobar  | 1992-12-06  | hello       | 2014-07-02  | 17     |
> | 19     | hello      | 1980-10-10  | hi          | 2000-01-01  | 19     |
> | 0      | now        | 2007-07-02  | foobar      | 2011-01-01  | 0      |
> | 10     | here       | 2009-11-11  | done        | 1947-08-14  | 10     |
> | 99999  | there      | 1993-09-03  | there       | 1993-09-03  | 99999  |
> | 22     | hi         | 2000-01-04  | there       | 2016-03-22  | 22     |
> | 13     | no         | 2016-10-01  | no          | 1992-12-06  | 13     |
> | 13     | no         | 2016-10-01  | foo         | 1982-10-11  | 13     |
> +--------+------------+-------------+-------------+-------------+--------+
> 15 rows selected (0.358 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)