You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Jose Miguel Suarez (JIRA)" <ji...@apache.org> on 2018/06/08 17:41:00 UTC
[jira] [Created] (DRILL-6483) Adding COALESCE column breaks FULL
OUTER JOIN expected results
Jose Miguel Suarez created DRILL-6483:
-----------------------------------------
Summary: Adding COALESCE column breaks FULL OUTER JOIN expected results
Key: DRILL-6483
URL: https://issues.apache.org/jira/browse/DRILL-6483
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 1.13.0
Environment: Windows 10 (1803) Pro 64bit
Using Drill in local mode, querying from a JDBC client (DBeaver)
tables are in the local filesystem with a storage plugin setup using headers:
{code:JSON}
"csv": {
"type": "text",
"extensions": [
"csv"
],
"extractHeader": true,
"delimiter": ","
{code}
Reporter: Jose Miguel Suarez
When a calculated column using coalesce (or equivalent case statement) is added to a query statement, an is null logic test returns unexpected results:
Example:
Consider two tables,
Table_A
||id||amount||
|1|100|
|2|200|
|3|300|
|4|400|
Table_B
||id||amount||
|3|30|
|4|40|
|5|50|
|6|60|
Running the following query:
{code:sql}
SELECT
A.id a_id, A.amount a_amount, B.id b_id, B.amount b_amount,
a.id IS NULL a_id_is_null, a.id IS NOT NULL a_id_is_not_null,
b.id IS NULL b_id_is_null, b.id IS NOT NULL b_id_is_not_null
FROM `Table_A.csv` A
FULL OUTER JOIN `Table_B.csv` B ON B.id = A.id
;
{code}
returns the expected results:
||a_id||a_amount||b_id||b_amount||a_id_is_null||a_id_is_not_null||b_id_is_null||b_id_is_not_null||
|1|100| | |false|true|true|false|
|2|200| | |false|true|true|false|
|3|300|3|30|false|true|false|true|
|4|400|4|40|false|true|false|true|
| | |5|50|true|false|false|true|
| | |6|60|true|false|false|true|
But if a coalesce column is added to the query:
{code:sql}
SELECT
A.id a_id, A.amount a_amount, B.id b_id, B.amount b_amount,
COALESCE(A.id, B.id) id,
a.id IS NULL a_id_is_null, a.id IS NOT NULL a_id_is_not_null,
b.id IS NULL b_id_is_null, b.id IS NOT NULL b_id_is_not_null
FROM `Table_A.csv` A
FULL OUTER JOIN `Table_B.csv` B ON B.id = A.id
;
{code}
then the results become incorrect:
||a_id||a_amount||b_id||b_amount||a_id_is_null||a_id_is_not_null||b_id_is_null||b_id_is_not_null||
|1|100| | |false|true|true|false|
|2|200| | |false|true|true|false|
|3|300|3|30|false|true|false|true|
|4|400|4|40|false|true|false|true|
| | |5|50|*{color:#d04437}_false_{color}*|false|false|true|
| | |6|60|*{color:#d04437}_false_{color}*|false|false|true|
because the Boolean values of the last two rows of column "a_id_is_null" should evaluate to true. The affected column depends of the order of the arguments of the coalesce function, if the field from table B was first, the wrong results would turn up in the "b_id_is_null" column.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)