You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Jose Miguel Suarez (JIRA)" <ji...@apache.org> on 2018/06/08 17:41:00 UTC

[jira] [Created] (DRILL-6483) Adding COALESCE column breaks FULL OUTER JOIN expected results

Jose Miguel Suarez created DRILL-6483:
-----------------------------------------

             Summary: Adding COALESCE column breaks FULL OUTER JOIN expected results
                 Key: DRILL-6483
                 URL: https://issues.apache.org/jira/browse/DRILL-6483
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 1.13.0
         Environment: Windows 10 (1803) Pro 64bit

Using Drill in local mode, querying from a JDBC client (DBeaver)

tables are in the local filesystem with a storage plugin setup using headers:
{code:JSON}
    "csv": {
      "type": "text",
      "extensions": [
        "csv"
      ],
      "extractHeader": true,
      "delimiter": ","
{code}
            Reporter: Jose Miguel Suarez


When a calculated column using coalesce (or equivalent case statement) is added to a query statement, an is null logic test returns unexpected results:

Example:

Consider two tables,

Table_A
||id||amount||
|1|100|
|2|200|
|3|300|
|4|400|

 

Table_B
||id||amount||
|3|30|
|4|40|
|5|50|
|6|60|

 

Running the following query:

 
{code:sql}
SELECT
    A.id a_id, A.amount a_amount, B.id b_id, B.amount b_amount, 
    a.id IS NULL a_id_is_null, a.id IS NOT NULL a_id_is_not_null,
    b.id IS NULL b_id_is_null, b.id IS NOT NULL b_id_is_not_null
FROM  `Table_A.csv` A
FULL OUTER JOIN `Table_B.csv` B ON B.id = A.id
;
{code}
returns the expected results:

 

 
||a_id||a_amount||b_id||b_amount||a_id_is_null||a_id_is_not_null||b_id_is_null||b_id_is_not_null||
|1|100| | |false|true|true|false|
|2|200| | |false|true|true|false|
|3|300|3|30|false|true|false|true|
|4|400|4|40|false|true|false|true|
| | |5|50|true|false|false|true|
| | |6|60|true|false|false|true|

 

But if a coalesce column is added to the query:
{code:sql}
SELECT
    A.id a_id, A.amount a_amount, B.id b_id, B.amount b_amount,
    COALESCE(A.id, B.id) id,
    a.id IS NULL a_id_is_null, a.id IS NOT NULL a_id_is_not_null,
    b.id IS NULL b_id_is_null, b.id IS NOT NULL b_id_is_not_null
FROM  `Table_A.csv` A
FULL OUTER JOIN `Table_B.csv` B ON B.id = A.id
;
{code}
then the results become incorrect:
||a_id||a_amount||b_id||b_amount||a_id_is_null||a_id_is_not_null||b_id_is_null||b_id_is_not_null||
|1|100| | |false|true|true|false|
|2|200| | |false|true|true|false|
|3|300|3|30|false|true|false|true|
|4|400|4|40|false|true|false|true|
| | |5|50|*{color:#d04437}_false_{color}*|false|false|true|
| | |6|60|*{color:#d04437}_false_{color}*|false|false|true|

because the Boolean values of the last two rows of column "a_id_is_null" should evaluate to true. The affected column depends of the order of the arguments of the coalesce function, if the field from table B was first, the wrong results would turn up in the "b_id_is_null" column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)