You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2020/02/18 01:40:28 UTC

[impala] branch master updated (d75562a -> 796f5cc)

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from d75562a  IMPALA-9256: Refactor constraint information into a separate class.
     new ba903de  Add log of created files for data load
     new 796f5cc  IMPALA-8909: Fix incorrectly omitting implicit cast in pre-insert sort

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 fe/src/main/java/org/apache/impala/planner/SortNode.java    |  8 +++++---
 testdata/bin/create-load-data.sh                            |  3 +++
 .../functional-query/queries/QueryTest/kudu_insert.test     | 13 +++++++++++++
 3 files changed, 21 insertions(+), 3 deletions(-)


[impala] 02/02: IMPALA-8909: Fix incorrectly omitting implicit cast in pre-insert sort

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 796f5cc43d545a645f9eed3d13934375f31a3ce8
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Thu Feb 13 18:59:07 2020 +0100

    IMPALA-8909: Fix incorrectly omitting implicit cast in pre-insert sort
    
    Inserts can add a sort node that orders the rows by partitioning
    and Kudu primary key columns (aka. clustered insert). The issue
    occurred when the target column was a timestamp and the source
    was an expression that returned a string (e.g. concat()). Impala
    adds an implicit cast to convert the strings to timestamps before
    sorting, but this cast was incorrectly removed later during expression
    substitution.
    
    This led to hitting a DCHECK in debug builds and a (not too
    informative) error message in release mode.
    
    Note that the cast in question is not visible in EXPLAIN outputs.
    Explain should contain implicit casts from explain_level=2 since
    https://gerrit.cloudera.org/#/c/11719/ , but it is still not shown
    in some expressions. I consider this to be a separate issue.
    
    Testing:
    - added an EE test that used to crash
    - ran planner / sort / kudu_insert tests
    
    Change-Id: Icca8ab1456a3b840a47833119c9d4fd31a1fff90
    Reviewed-on: http://gerrit.cloudera.org:8080/15217
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 fe/src/main/java/org/apache/impala/planner/SortNode.java    |  8 +++++---
 .../functional-query/queries/QueryTest/kudu_insert.test     | 13 +++++++++++++
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/planner/SortNode.java b/fe/src/main/java/org/apache/impala/planner/SortNode.java
index c05969c..ba894c3 100644
--- a/fe/src/main/java/org/apache/impala/planner/SortNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/SortNode.java
@@ -143,13 +143,15 @@ public class SortNode extends PlanNode {
       outputSmap_.put(slotExprs.get(i), new SlotRef(sortTupleSlots.get(i)));
     }
     ExprSubstitutionMap childSmap = getCombinedChildSmap();
+    // Preserve type as resolvedTupleExprs_ will be used to materialize the tuple and the
+    // layout is already calculated.
     resolvedTupleExprs_ =
-        Expr.substituteList(resolvedTupleExprs_, childSmap, analyzer, false);
+        Expr.substituteList(resolvedTupleExprs_, childSmap, analyzer, true);
 
     // Remap the ordering exprs to the tuple materialized by this sort node. The mapping
     // is a composition of the childSmap and the outputSmap_ because the child node may
-    // have also remapped its input (e.g., as in a a series of (sort->analytic)* nodes).
-    // Parent nodes have have to do the same so set the composition as the outputSmap_.
+    // have also remapped its input (e.g., as in a series of (sort->analytic)* nodes).
+    // Parent nodes have to do the same so set the composition as the outputSmap_.
     outputSmap_ = ExprSubstitutionMap.compose(childSmap, outputSmap_, analyzer);
 
     info_.substituteSortExprs(outputSmap_, analyzer);
diff --git a/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test b/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
index 981a734..d2b734e 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test
@@ -320,6 +320,19 @@ NumRowErrors: 0
 row_regex: .*EXEC_SINGLE_NODE_ROWS_THRESHOLD=0.*
 ====
 ---- QUERY
+create table timestampkey (t timestamp primary key)
+  partition by hash(t) PARTITIONS 2 stored as kudu;
+====
+---- QUERY
+# Regression test for IMPALA-8909 (used to hit a DCHECK in debug builds).
+insert into timestampkey select concat(string_col, " ") from functional.alltypessmall;
+---- RESULTS
+: 0
+---- RUNTIME_PROFILE
+NumModifiedRows: 0
+NumRowErrors: 100
+====
+---- QUERY
 # Table with default values
 create table tbl_with_defaults (a int primary key, b int null default 10,
   c int not null default 100, d int default 1000, e int null, f int not null,


[impala] 01/02: Add log of created files for data load

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit ba903deba7e529df6d3136589086105a99c878c1
Author: norbert.luksa <no...@cloudera.com>
AuthorDate: Mon Feb 17 15:36:10 2020 +0100

    Add log of created files for data load
    
    As Joe pointed out in IMPALA-9351, it would help debugging issues with
    missing files if we had logged the created files when loading the data.
    
    With this commit, running create-load-data.sh now logs the created
    files into created-files.log.
    
    Change-Id: I4f413810c6202a07c19ad1893088feedd9f7278f
    Reviewed-on: http://gerrit.cloudera.org:8080/15234
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 testdata/bin/create-load-data.sh | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index 65d18e3..48ba7f5 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -719,6 +719,9 @@ if [ "${TARGET_FILESYSTEM}" = "hdfs" ]; then
       create-internal-hbase-table
 
   run-step "Checking HDFS health" check-hdfs-health.log check-hdfs-health
+
+  # Saving the list of created files can help in debugging missing files.
+  run-step "Logging created files" created-files.log hdfs dfs -ls -R /test-warehouse
 fi
 
 # TODO: Investigate why all stats are not preserved. Theoretically, we only need to