You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Kurt Deschler (Jira)" <ji...@apache.org> on 2020/03/03 05:51:00 UTC

[jira] [Commented] (IMPALA-9429) Unioned partition columns break partition pruning

    [ https://issues.apache.org/jira/browse/IMPALA-9429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049925#comment-17049925 ] 

Kurt Deschler commented on IMPALA-9429:
---------------------------------------

Analysis of a slightly more simple example:

select col3 from (select col2, col3 from debug_with_partition union all select col2, 1 col3 from debug_without_partition) a where col2 = 0 or col3 = 0;

The problem manifests as follows:

1. Conjuncts has a CompoundPredicate with (SlotRef = constant OR SlotRef = constant). This is not simplified during analysis.

2. SingleNodePlanner.createUnionPlan() pushes conjuncts to the arms. This creates a constant BinaryPredicate Expr  due to col3 select item being the constant (1) which is eligible to be folded to a boolean literal. 

3. HdfsPartitionPruner.canEvalUsingPartitionMd() calls the constant folder to check that constants has already been folded. There is a precondition that check that any remaining BinaryPredicate Exprs are preserved.

A plausible fix is to fold constants in SingleNodePlanner.createUnionPlan after substitution as follows:

{{--- a/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java}}
{{+++ b/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java}}
{{@@ -1725,7 +1725,9 @@ public class SingleNodePlanner {}}
{{       for (UnionOperand op: unionStmt.getOperands()) {}}
{{         List<Expr> opConjuncts =}}
{{             Expr.substituteList(conjuncts, op.getSmap(), analyzer, false);}}
{{+        analyzer.getConstantFolder().rewriteList(opConjuncts, analyzer);}}
{{         op.getAnalyzer().registerConjuncts(opConjuncts);}}
{{       }}}
{{       analyzer.markConjunctsAssigned(conjuncts);}}

> Unioned partition columns break partition pruning
> -------------------------------------------------
>
>                 Key: IMPALA-9429
>                 URL: https://issues.apache.org/jira/browse/IMPALA-9429
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Frontend
>    Affects Versions: Impala 3.2.0
>            Reporter: Max Mizikar
>            Assignee: Kurt Deschler
>            Priority: Critical
>
> We have different granularity of partitions on our landing tables vs our compacted tables. We use a view to union our landing and our compacted. After an upgrade from cdh5.15 (Impala v2.12.0) to cdh6.3 (Impala 3.2.0) we started having issues with our union-ed tables. I've come up with this as the smallest breaking example.
> {code:java}
> [:21000] debug> create table debug_with_partition (col1 int) partitioned by (col2 int, col3 int);                                                                                                                                                                                             
> Query: create table debug_with_partition (col1 int) partitioned by (col2 int, col3 int)
> +-------------------------+
> | summary                 |
> +-------------------------+
> | Table has been created. |
> +-------------------------+
> Fetched 1 row(s) in 0.09s
> [:21000] debug> create table debug_without_partition (col1 int) partitioned by (col2 int);                                                                                                                                                                                                    
> Query: create table debug_without_partition (col1 int) partitioned by (col2 int)
> +-------------------------+
> | summary                 |
> +-------------------------+
> | Table has been created. |
> +-------------------------+
> Fetched 1 row(s) in 0.03s
> [:21000] debug> create view debug as select col1, col2, col3 from debug_with_partition union all select col1, col2, null from debug_without_partition;                                                                                                                                        
> Query: create view debug as select col1, col2, col3 from debug_with_partition union all select col1, col2, null from debug_without_partition
> Query submitted at: 2020-02-26 17:04:58 (Coordinator: :25000)
> Query progress can be monitored at: :25000/query_plan?query_id=28453bdf5f919fe9:66fef22200000000
> +------------------------+
> | summary                |
> +------------------------+
> | View has been created. |
> +------------------------+
> Fetched 1 row(s) in 5.65s
> [:21000] debug> select * from debug where col2 = 0 or col3 = 0;                                                                                                                                                                                                                               
> Query: select * from debug where col2 = 0 or col3 = 0
> Query submitted at: 2020-02-26 17:05:21 (Coordinator: t:25000)
> ERROR: IllegalStateException: null
> {code}
> Here is what I find in the log
> {code:java}
> I0226 17:05:21.099532 129442 jni-util.cc:256] c34e2a72018579fe:3d7388e100000000] java.lang.IllegalStateException
>                                 at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
>                                 at org.apache.impala.planner.HdfsPartitionPruner.canEvalUsingPartitionMd(HdfsPartitionPruner.java:196)
>                                 at org.apache.impala.planner.HdfsPartitionPruner.canEvalUsingPartitionMd(HdfsPartitionPruner.java:211)
>                                 at org.apache.impala.planner.HdfsPartitionPruner.prunePartitions(HdfsPartitionPruner.java:131)
>                                 at org.apache.impala.planner.SingleNodePlanner.createHdfsScanPlan(SingleNodePlanner.java:1257)
>                                 at org.apache.impala.planner.SingleNodePlanner.createScanNode(SingleNodePlanner.java:1348)
>                                 at org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1535)
>                                 at org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:814)
>                                 at org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:650)
>                                 at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:258)
>                                 at org.apache.impala.planner.SingleNodePlanner.createUnionPlan(SingleNodePlanner.java:1584)
>                                 at org.apache.impala.planner.SingleNodePlanner.createUnionPlan(SingleNodePlanner.java:1651)
>                                 at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:280)
>                                 at org.apache.impala.planner.SingleNodePlanner.createInlineViewPlan(SingleNodePlanner.java:1088)
>                                 at org.apache.impala.planner.SingleNodePlanner.createTableRefNode(SingleNodePlanner.java:1546)
>                                 at org.apache.impala.planner.SingleNodePlanner.createTableRefsPlan(SingleNodePlanner.java:814)
>                                 at org.apache.impala.planner.SingleNodePlanner.createSelectPlan(SingleNodePlanner.java:650)
>                                 at org.apache.impala.planner.SingleNodePlanner.createQueryPlan(SingleNodePlanner.java:258)
>                                 at org.apache.impala.planner.SingleNodePlanner.createSingleNodePlan(SingleNodePlanner.java:148)
>                                 at org.apache.impala.planner.Planner.createPlan(Planner.java:103)
>                                 at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1171)
>                                 at org.apache.impala.service.Frontend.getPlannedExecRequest(Frontend.java:1466)
>                                 at org.apache.impala.service.Frontend.doCreateExecRequest(Frontend.java:1345)
>                                 at org.apache.impala.service.Frontend.getTExecRequest(Frontend.java:1252)
>                                 at org.apache.impala.service.Frontend.createExecRequest(Frontend.java:1222)
>                                 at org.apache.impala.service.JniFrontend.createExecRequest(JniFrontend.java:167)
> I0226 17:05:21.099617 129442 status.cc:124] c34e2a72018579fe:3d7388e100000000] IllegalStateException: null
>     @           0xb4c459
>     @          0x114fe2e
>     @          0x102ab53
>     @          0x1052ba2
>     @          0x105e88c
>     @          0x109e5be
>     @          0x138fee4
>     @          0x138f39c
>     @           0xb18169
>     @           0xf2d1d8
>     @           0xf23c4e
>     @           0xf24ae1
>     @          0x11c5e0f
>     @          0x11c69b9
>     @          0x1840569
>     @     0x7f2ef82926b9
>     @     0x7f2ef7fc841c
> {code}
> I've done some level of debugging from the shell and I find that the following things work
>  querying just on the null filled column
> {code:java}
> [:21000] debug> select * from debug where col3 = 0;
> Query: select * from debug where col3 = 0
> Query submitted at: 2020-02-26 17:07:07 (Coordinator: :25000)
> Query progress can be monitored at: :25000/query_plan?query_id=1b44157731b6f5ff:d052c2c600000000
> Fetched 0 row(s) in 0.11s
> {code}
> query with an and on the null filled column
> {code:java}
> [:21000] debug> select * from debug where col2 = 0 and col3 = 0;
> Query: select * from debug where col2 = 0 and col3 = 0
> Query submitted at: 2020-02-26 17:07:27 (Coordinator: :25000)
> Query progress can be monitored at: :25000/query_plan?query_id=334f7fbf2367a558:6ebe4d6100000000
> Fetched 0 row(s) in 0.11s
> {code}
> casting the null filled column
> {code:java}
> [:21000] debug> select * from debug where col2 = 0 or cast(col3 as int) = 0;
> Query: select * from debug where col2 = 0 or cast(col3 as int) = 0
> Query submitted at: 2020-02-26 17:08:26 (Coordinator: :25000)
> Query progress can be monitored at: :25000/query_plan?query_id=1a4d43d8fc9fc45d:662922b900000000
> Fetched 0 row(s) in 0.11s
> {code}
> Please let me know if there is anything else I can do to help!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org