You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2015/01/08 03:09:34 UTC

[jira] [Created] (DRILL-1957) NULL values in NOT IN sub-queries could lead to incorrect query result.

Jinfeng Ni created DRILL-1957:
---------------------------------

             Summary: NULL values in NOT IN sub-queries could lead to incorrect query result. 
                 Key: DRILL-1957
                 URL: https://issues.apache.org/jira/browse/DRILL-1957
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni


NULL values in NOT IN sub-queries will disqualify all the rows, and hence return 0 row for a query. However, Drill could return some rows, due to issue in the query planning.

For example, consider the following sample data :
{code}
select deptno from dfs.`/Users/jni/work/data/json/e.json` ;
+------------+
|   deptno   |
+------------+
| 10         |
| 20         |
| null       |
+------------+
3 rows selected (0.062 seconds)
0: jdbc:drill:zk=local> select deptno from dfs.`/Users/jni/work/data/json/d.json` ;
+------------+
|   deptno   |
+------------+
| 10         |
| null       |
+------------+
2 rows selected (0.062 seconds)
{code}

The following query should return 0 row, but Drill actually return 1 row.

{code}
select deptno from dfs.`/Users/jni/work/data/json/e.json` where deptno not in (select deptno from dfs.`/Users/jni/work/data/json/d.json`);
+------------+
|   deptno   |
+------------+
| 20         |
+------------+
1 row selected (0.286 seconds)
{code}

In comparison, here is the result run on posture.
{code}
mydb=# select * from e;
 deptno | ename
--------+-------
     10 | Alice
   NULL | Tom
     20 | Bob
(3 rows)

mydb=# delete from d where deptno = 20;
DELETE 1
mydb=# select * from d;
 deptno | dname
--------+-------
     10 | Sales
   NULL | Eng
(2 rows)

mydb=# select deptno from e where deptno not in (select deptno from d);
 deptno
--------
(0 rows)
{code}

The fix seems to be in OPTIQ-373, which will produce a valid plan. However, such plan contain a cross-join, which is not supported Drill, and hence hit CanNotPlanException in Drill.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)