You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/07/01 19:22:04 UTC

[jira] [Commented] (DRILL-2235) Assert when NOT IN clause contains multiple columns

    [ https://issues.apache.org/jira/browse/DRILL-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14610664#comment-14610664 ] 

Aman Sinha commented on DRILL-2235:
-----------------------------------

This query plans successfully since we now support  (since Drill 1.0) NestedLoopJoin with scalar subqueries.   Here's a plan with query agains TPC-H:

{code}
0: jdbc:drill:zk=local> explain plan for select n1.n_name from cp.`tpch/nation.parquet` n1 where (n1.n_nationkey, n1.n_regionkey) not in (select n2.n_nationkey, n2.n_regionkey from cp.`tpch/nation.parquet` n2 where n2.n_regionkey < 10);
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(n_name=[$0])
00-02        SelectionVectorRemover
00-03          Filter(condition=[NOT(CASE(=($1, 0), false, IS NOT NULL($7), true, IS NULL($3), null, IS NULL($4), null, <($2, $1), null, false))])
00-04            HashJoin(condition=[AND(=($3, $5), =($4, $6))], joinType=[left])
00-06              Project(n_name=[$2], $f0=[$3], $f1=[$4], f5=[$0], f6=[$1])
00-08                NestedLoopJoin(condition=[true], joinType=[inner])
00-11                  Project(n_nationkey=[$2], n_regionkey=[$0], n_name=[$1])
00-14                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`, `n_name`]]])
00-10                  StreamAgg(group=[{}], agg#0=[COUNT()], agg#1=[COUNT($0, $1)])
00-13                    Project($f0=[$0], $f1=[$1], $f2=[true])
00-16                      SelectionVectorRemover
00-18                        Filter(condition=[<($1, 10)])
00-20                          Project(n_nationkey=[$1], n_regionkey=[$0])
00-21                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`]]])
00-05              Project($f00=[$0], $f10=[$1], $f2=[$2])
00-07                HashAgg(group=[{0, 1}], agg#0=[MIN($2)])
00-09                  Project($f0=[$0], $f1=[$1], $f2=[true])
00-12                    SelectionVectorRemover
00-15                      Filter(condition=[<($1, 10)])
00-17                        Project(n_nationkey=[$1], n_regionkey=[$0])
00-19                          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`]]])
{code} 

However, note that the StreamAgg is doing a COUNT($0, $1)  .. it seems Calcite generates such an aggregate expression.  I am not sure what is the semantics of count(a, b).   Running this query fails during execution because we don't support this function. 

> Assert when NOT IN clause contains multiple columns
> ---------------------------------------------------
>
>                 Key: DRILL-2235
>                 URL: https://issues.apache.org/jira/browse/DRILL-2235
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.8.0
>            Reporter: Victoria Markman
>            Assignee: Aman Sinha
>             Fix For: 1.2.0
>
>
> {code}
> 0: jdbc:drill:schema=dfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.056 seconds)
> 0: jdbc:drill:schema=dfs> select * from t2;
> +------------+------------+------------+
> |     a2     |     b2     |     c2     |
> +------------+------------+------------+
> | 0          | zzz        | 2014-12-31 |
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 2          | bbbbb      | 2015-01-02 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | ddddd      | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | 7          | ggggg      | 2015-01-07 |
> | 8          | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | 2015-01-09 |
> +------------+------------+------------+
> 13 rows selected (0.069 seconds)
> {code}
> IN clause returns correct result:
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from t1 where (a1, b1) in (select a2, b2 from t2);
> +------------+
> |   EXPR$0   |
> +------------+
> | 7          |
> +------------+
> 1 row selected (0.258 seconds)
> {code}
> NOT IN clause asserts:
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from t1 where (a1, b1) not in (select a2, b2 from t2);
> Query failed: AssertionError: AND(AND(NOT(IS TRUE($7)), IS NOT NULL($3)), IS NOT NULL($4))
> Error: exception while executing query: Failure while executing query. (state=,code=0)
> {code}
> {code}
> #Thu Feb 12 12:13:26 EST 2015
> git.commit.id.abbrev=de89f36
> {code}
> drillbit.log
> {code}
> 2015-02-12 22:47:11,730 [2b22d290-315e-4450-8b3f-9b3590eb20c3:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - State change requested.  PENDING --> FAILED
> org.apache.drill.exec.work.foreman.ForemanException: Unexpected exception during fragment initialization: AND(AND(NOT(IS TRUE($7)), IS NOT NULL($3)), IS NOT NULL($4))
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:197) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.WorkManager$RunnableWrapper.run(WorkManager.java:303) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_71]
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_71]
>         at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
> Caused by: java.lang.AssertionError: AND(AND(NOT(IS TRUE($7)), IS NOT NULL($3)), IS NOT NULL($4))
>         at org.eigenbase.rel.FilterRelBase.<init>(FilterRelBase.java:56) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.rel.FilterRel.<init>(FilterRel.java:50) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.rel.CalcRel.createFilter(CalcRel.java:212) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.sql2rel.SqlToRelConverter.convertWhere(SqlToRelConverter.java:840) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.sql2rel.SqlToRelConverter.convertSelectImpl(SqlToRelConverter.java:497) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.sql2rel.SqlToRelConverter.convertSelect(SqlToRelConverter.java:474) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.sql2rel.SqlToRelConverter.convertQueryRecursive(SqlToRelConverter.java:2657) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.eigenbase.sql2rel.SqlToRelConverter.convertQuery(SqlToRelConverter.java:432) ~[optiq-core-0.9-drill-r18.jar:na]
>         at net.hydromatic.optiq.prepare.PlannerImpl.convert(PlannerImpl.java:186) ~[optiq-core-0.9-drill-r18.jar:na]
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:163) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:126) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:145) ~[drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:515) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:188) [drill-java-exec-0.8.0-SNAPSHOT-rebuffed.jar:0.8.0-SNAPSHOT]
>         ... 4 common frames omitted
> 2015-02-12 22:47:11,747 [2b22d290-315e-4450-8b3f-9b3590eb20c3:foreman] ERROR o.a.drill.exec.work.foreman.Foreman - Error 8f0bb8dd-deac-4846-9608-e941da9035e8: AssertionError: AND(AND(NOT(IS TRUE($7)), IS NOT NULL($3)), IS NOT NULL($4))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)