You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by jo...@apache.org on 2020/04/04 17:40:30 UTC

[impala] branch branch-3.4.0 updated (8842ccf -> 9dde3c1)

This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a change to branch branch-3.4.0
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 8842ccf  IMPALA-9582: Upgrade thrift_sasl to 0.4.2 for impala-shell
     new 4ba4ab7  IMPALA-9429: Unioned partition columns break partition pruning
     new 9dde3c1  IMPALA-9571: Fix Impala crash from unexpected boost filesystem_error exception

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/util/filesystem-util.cc                     |  19 ++-
 .../apache/impala/planner/HdfsPartitionPruner.java |  15 +-
 .../queries/PlannerTest/union.test                 | 166 +++++++++++++++++++++
 3 files changed, 189 insertions(+), 11 deletions(-)


[impala] 02/02: IMPALA-9571: Fix Impala crash from unexpected boost filesystem_error exception

Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch branch-3.4.0
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 9dde3c15473d9d0ccf631c9e64a4617613d6394d
Author: Joe McDonnell <jo...@cloudera.com>
AuthorDate: Thu Apr 2 15:19:42 2020 -0700

    IMPALA-9571: Fix Impala crash from unexpected boost filesystem_error exception
    
    The no-exception signature of boost::filesystem::remove_all() has a
    longstanding bug where it can throw an exception even for the
    no-exceptions interface. Boost recently added the "noexcept" specifier
    to the no-exception signature of remove_all(). When a "noexcept"
    function throws an exception, it causes the program to call std::terminate()
    which aborts the program. This makes the try/catch block around
    remove_all() useless.
    
    As a point fix, this switches FilesystemUtil to use the remove_all()
    signature that can throw exceptions. This code already had a try/catch
    block around remove_all(), so now it just uses that to get the error
    code.
    
    Newer versions of boost (1.63+) fix the underlying problem, so this
    code can be removed when we upgrade boost.
    
    Testing:
     - Ran core tests
     - Ran FilesystemUtil backend test
    
    Change-Id: I23016f37401604fc41fc99c0a056439a31333e3f
    Reviewed-on: http://gerrit.cloudera.org:8080/15640
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Csaba Ringhofer <cs...@cloudera.com>
    (cherry picked from commit 76061331781e67b90d7750abadfaf8b013dc64a5)
---
 be/src/util/filesystem-util.cc | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/be/src/util/filesystem-util.cc b/be/src/util/filesystem-util.cc
index ea0a10a..b4fce78 100644
--- a/be/src/util/filesystem-util.cc
+++ b/be/src/util/filesystem-util.cc
@@ -87,10 +87,16 @@ Status FileSystemUtil::RemoveAndCreateDirectory(const string& directory) {
     // the check for existence above and the removal here. If the directory is removed in
     // this window, we may get "no_such_file_or_directory" error which is fine.
     //
-    // There is a bug in boost library (as of version 1.6) which may lead to unexpected
-    // exceptions even though we are using the no-exceptions interface. See IMPALA-2846.
+    // There is a bug in boost library (as of version 1.61) which may lead to unexpected
+    // exceptions from the no-exceptions interface. Even worse, the no-exceptions
+    // interface is marked as noexcept, so the unexpected exception will cause the
+    // program to immediately call std::terminate(). This renders the try/catch block
+    // useless. For now, this uses the interface that can throw exceptions and
+    // gets the errorcode from the exception. See IMPALA-2846 + IMPALA-9571.
+    // TODO: Newer boost has a fix for this bug, so this can be switched back after
+    // upgrading boost.
     try {
-      filesystem::remove_all(directory, errcode);
+      filesystem::remove_all(directory);
     } catch (filesystem::filesystem_error& e) {
       errcode = e.code();
     }
@@ -111,10 +117,11 @@ Status FileSystemUtil::RemoveAndCreateDirectory(const string& directory) {
 Status FileSystemUtil::RemovePaths(const vector<string>& directories) {
   for (int i = 0; i < directories.size(); ++i) {
     error_code errcode;
-    // There is a bug in boost library (as of version 1.6) which may lead to unexpected
-    // exceptions even though we are using the no-exceptions interface. See IMPALA-2846.
+    // This uses the boost remove_all() interface that can throw exceptions to avoid a
+    // bug in the implementation of the no-exceptions remove_all(). See comment in
+    // RemoveAndCreateDirectory() for more details.
     try {
-      filesystem::remove_all(directories[i], errcode);
+      filesystem::remove_all(directories[i]);
     } catch (filesystem::filesystem_error& e) {
       errcode = e.code();
     }


[impala] 01/02: IMPALA-9429: Unioned partition columns break partition pruning

Posted by jo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

joemcdonnell pushed a commit to branch branch-3.4.0
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4ba4ab733302e7d5839293c2a556528f60da7f91
Author: Kurt Deschler <kd...@cloudera.com>
AuthorDate: Tue Mar 31 22:35:57 2020 -0500

    IMPALA-9429: Unioned partition columns break partition pruning
    
    In the case of a union query where predicates are pushed into the union,
    predicate Exprs can contain SlotReft that are transformed into constants
    after analysis and eligible for constant folding. During partition
    pruning there is a check that eligible constant folding has already
    occurred which was failing and reporting IllegalStateException since the
    surrounding code only handles specific cases.
    
    This fix adds FoldConstantsRule to the rewriter used by
    HdfsPartitionPruner.prunePartitions so that pruning predicates have any
    constants folded as expected.
    
    Testing:
    Testcases added to PlannerTest/union.test based on provided repo using
    alltypes tables.
    
    Change-Id: I1c1384c2cd1ad5f7024449196f9a348ecdccb60b
    Reviewed-on: http://gerrit.cloudera.org:8080/15371
    Reviewed-by: Quanlong Huang <hu...@gmail.com>
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    (cherry picked from commit 3bd5c4aaa2c0ba00856effab83d8cb9bc1e49406)
---
 .../apache/impala/planner/HdfsPartitionPruner.java |  15 +-
 .../queries/PlannerTest/union.test                 | 166 +++++++++++++++++++++
 2 files changed, 176 insertions(+), 5 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java b/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
index 1706520..98ab761 100644
--- a/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
+++ b/fe/src/main/java/org/apache/impala/planner/HdfsPartitionPruner.java
@@ -18,6 +18,7 @@
 package org.apache.impala.planner;
 
 import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
@@ -47,6 +48,7 @@ import org.apache.impala.common.ImpalaException;
 import org.apache.impala.common.Pair;
 import org.apache.impala.rewrite.BetweenToCompoundRule;
 import org.apache.impala.rewrite.ExprRewriter;
+import org.apache.impala.rewrite.FoldConstantsRule;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -87,7 +89,9 @@ public class HdfsPartitionPruner {
   // For converting BetweenPredicates to CompoundPredicates so they can be
   // executed in the BE.
   private final ExprRewriter exprRewriter_ =
-      new ExprRewriter(BetweenToCompoundRule.INSTANCE);
+      new ExprRewriter(new ArrayList<>(
+          Arrays.asList(BetweenToCompoundRule.INSTANCE,
+                        FoldConstantsRule.INSTANCE)));
 
   public HdfsPartitionPruner(TupleDescriptor tupleDesc) {
     Preconditions.checkState(tupleDesc.getTable() instanceof FeFsTable);
@@ -99,7 +103,8 @@ public class HdfsPartitionPruner {
   /**
    * Return a list of partitions left after applying the conjuncts.
    * Conjuncts used for filtering will be removed from the list 'conjuncts' and
-   * returned as the second item in the returned Pair. These expressions can be
+   * returned as the second item in the returned Pair. These expressions
+   * include planner rewrites such as view reference substitutions and can be
    * shown in the EXPLAIN output.
    *
    * If 'allowEmpty' is False, empty partitions are not returned.
@@ -133,7 +138,7 @@ public class HdfsPartitionPruner {
         } else {
           partitionFilters.add(new HdfsPartitionFilter(clonedConjunct, tbl_, analyzer));
         }
-        partitionConjuncts.add(conjunct);
+        partitionConjuncts.add(clonedConjunct);
         it.remove();
       }
     }
@@ -189,8 +194,8 @@ public class HdfsPartitionPruner {
     if (expr instanceof BinaryPredicate) {
       // Evaluate any constant expression in the BE
       try {
-        // TODO: Analyzer should already have done constant folding
-        // and rewrite -- unless this is a copy of an expression taken
+        // Constant folding and rewrite should already be done by this
+        // point -- unless this is a copy of an expression taken
         // before analysis, which would introduce its own issues.
         expr = analyzer.getConstantFolder().rewrite(expr, analyzer);
         Preconditions.checkState(expr instanceof BinaryPredicate);
diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/union.test b/testdata/workloads/functional-planner/queries/PlannerTest/union.test
index e410f9d..c6ad4ef 100644
--- a/testdata/workloads/functional-planner/queries/PlannerTest/union.test
+++ b/testdata/workloads/functional-planner/queries/PlannerTest/union.test
@@ -4464,3 +4464,169 @@ Per-Host Resources: mem-estimate=32.00MB mem-reservation=8.00KB thread-reservati
    tuple-ids=0 row-size=0B cardinality=8
    in pipelines: 01(GETNEXT)
 ====
+# IMPALA-9429: Test partition elimination merging inputs with different
+# partitioning
+select month from (
+    select year, month from functional.alltypes
+    union all
+    select year, 1 month from functional.alltypesagg
+  ) a where year = 0 or month = 0;
+---- QUERYOPTIONS
+explain_level=2
+---- DISTRIBUTEDPLAN
+F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+|  Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1
+PLAN-ROOT SINK
+|  output exprs: month
+|  mem-estimate=0B mem-reservation=0B thread-reservation=0
+|
+03:EXCHANGE [UNPARTITIONED]
+|  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+|  tuple-ids=2 row-size=4B cardinality=0
+|  in pipelines: 01(GETNEXT), 02(GETNEXT)
+|
+F02:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1
+00:UNION
+|  pass-through-operands: 01
+|  mem-estimate=0B mem-reservation=0B thread-reservation=0
+|  tuple-ids=2 row-size=4B cardinality=0
+|  in pipelines: 01(GETNEXT), 02(GETNEXT)
+|
+|--02:SCAN HDFS [functional.alltypesagg, RANDOM]
+|     partition predicates: `year` = CAST(0 AS INT) OR FALSE
+|     partitions=0/11 files=0 size=0B
+|     stored statistics:
+|       table: rows=11.00K size=814.73KB
+|       partitions: 0/0 rows=unavailable
+|       columns: all
+|     extrapolated-rows=disabled max-scan-range-rows=0
+|     mem-estimate=0B mem-reservation=0B thread-reservation=0
+|     tuple-ids=1 row-size=0B cardinality=0
+|     in pipelines: 02(GETNEXT)
+|
+01:SCAN HDFS [functional.alltypes, RANDOM]
+   partition predicates: functional.alltypes.year = CAST(0 AS INT) OR functional.alltypes.month = CAST(0 AS INT)
+   partitions=0/24 files=0 size=0B
+   stored statistics:
+     table: rows=7.30K size=478.45KB
+     partitions: 0/0 rows=unavailable
+     columns: all
+   extrapolated-rows=disabled max-scan-range-rows=0
+   mem-estimate=0B mem-reservation=0B thread-reservation=0
+   tuple-ids=0 row-size=4B cardinality=0
+   in pipelines: 01(GETNEXT)
+====
+select month from (
+    select year, month from functional.alltypes
+    union all
+    select year, 1 month from functional.alltypesagg
+  ) a where year = NULL or month = NULL;
+---- QUERYOPTIONS
+explain_level=2
+---- DISTRIBUTEDPLAN
+F03:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+|  Per-Host Resources: mem-estimate=16.00KB mem-reservation=0B thread-reservation=1
+PLAN-ROOT SINK
+|  output exprs: month
+|  mem-estimate=0B mem-reservation=0B thread-reservation=0
+|
+03:EXCHANGE [UNPARTITIONED]
+|  mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+|  tuple-ids=2 row-size=4B cardinality=0
+|  in pipelines: 01(GETNEXT), 02(GETNEXT)
+|
+F02:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Host Resources: mem-estimate=0B mem-reservation=0B thread-reservation=1
+00:UNION
+|  pass-through-operands: 01
+|  mem-estimate=0B mem-reservation=0B thread-reservation=0
+|  tuple-ids=2 row-size=4B cardinality=0
+|  in pipelines: 01(GETNEXT), 02(GETNEXT)
+|
+|--02:SCAN HDFS [functional.alltypesagg, RANDOM]
+|     partition predicates: `year` = NULL OR NULL
+|     partitions=0/11 files=0 size=0B
+|     stored statistics:
+|       table: rows=11.00K size=814.73KB
+|       partitions: 0/0 rows=unavailable
+|       columns: all
+|     extrapolated-rows=disabled max-scan-range-rows=0
+|     mem-estimate=0B mem-reservation=0B thread-reservation=0
+|     tuple-ids=1 row-size=0B cardinality=0
+|     in pipelines: 02(GETNEXT)
+|
+01:SCAN HDFS [functional.alltypes, RANDOM]
+   partition predicates: functional.alltypes.year = NULL OR functional.alltypes.month = NULL
+   partitions=0/24 files=0 size=0B
+   stored statistics:
+     table: rows=7.30K size=478.45KB
+     partitions: 0/0 rows=unavailable
+     columns: all
+   extrapolated-rows=disabled max-scan-range-rows=0
+   mem-estimate=0B mem-reservation=0B thread-reservation=0
+   tuple-ids=0 row-size=4B cardinality=0
+   in pipelines: 01(GETNEXT)
+====
+select year, month from (
+   select year, 1 month from functional.alltypesagg
+ ) t where year = 2010 or month = 0;
+---- QUERYOPTIONS
+explain_level=2
+---- DISTRIBUTEDPLAN
+F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+|  Per-Host Resources: mem-estimate=38.32KB mem-reservation=0B thread-reservation=1
+PLAN-ROOT SINK
+|  output exprs: `year`, CAST(1 AS TINYINT)
+|  mem-estimate=0B mem-reservation=0B thread-reservation=0
+|
+01:EXCHANGE [UNPARTITIONED]
+|  mem-estimate=38.32KB mem-reservation=0B thread-reservation=0
+|  tuple-ids=0 row-size=4B cardinality=11.00K
+|  in pipelines: 00(GETNEXT)
+|
+F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Host Resources: mem-estimate=80.00MB mem-reservation=128.00KB thread-reservation=2
+00:SCAN HDFS [functional.alltypesagg, RANDOM]
+   partition predicates: `year` = CAST(2010 AS INT) OR FALSE
+   HDFS partitions=11/11 files=11 size=814.73KB
+   stored statistics:
+     table: rows=11.00K size=814.73KB
+     partitions: 11/11 rows=11.00K
+     columns: all
+   extrapolated-rows=disabled max-scan-range-rows=1.00K
+   mem-estimate=80.00MB mem-reservation=128.00KB thread-reservation=1
+   tuple-ids=0 row-size=4B cardinality=11.00K
+   in pipelines: 00(GETNEXT)
+====
+select year, month from (
+   select year, 1 month from functional.alltypesagg
+ ) t where year = 2010 or month between 1 and 2;
+---- QUERYOPTIONS
+explain_level=2
+---- DISTRIBUTEDPLAN
+F01:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+|  Per-Host Resources: mem-estimate=38.32KB mem-reservation=0B thread-reservation=1
+PLAN-ROOT SINK
+|  output exprs: `year`, CAST(1 AS TINYINT)
+|  mem-estimate=0B mem-reservation=0B thread-reservation=0
+|
+01:EXCHANGE [UNPARTITIONED]
+|  mem-estimate=38.32KB mem-reservation=0B thread-reservation=0
+|  tuple-ids=0 row-size=4B cardinality=11.00K
+|  in pipelines: 00(GETNEXT)
+|
+F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Host Resources: mem-estimate=80.00MB mem-reservation=128.00KB thread-reservation=2
+00:SCAN HDFS [functional.alltypesagg, RANDOM]
+   partition predicates: `year` = CAST(2010 AS INT) OR TRUE
+   HDFS partitions=11/11 files=11 size=814.73KB
+   stored statistics:
+     table: rows=11.00K size=814.73KB
+     partitions: 11/11 rows=11.00K
+     columns: all
+   extrapolated-rows=disabled max-scan-range-rows=1.00K
+   mem-estimate=80.00MB mem-reservation=128.00KB thread-reservation=1
+   tuple-ids=0 row-size=4B cardinality=11.00K
+   in pipelines: 00(GETNEXT)
+====