You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by ta...@apache.org on 2020/06/03 18:12:11 UTC

[impala] branch master updated (bfdc5bf -> 37b5599)

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from bfdc5bf  IMPALA-9702: Cleanup unique_database directories
     new 69b0b15  IMPALA-9000: clean up misc TODO-MT comments
     new 37b5599  IMPALA-9809: Multi-aggregation query on particular dataset crashes impalad

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/grouping-aggregator-ir.cc                    |  8 +++++---
 be/src/runtime/fragment-instance-state.h                 |  4 +---
 bin/run-all-tests.sh                                     |  3 +--
 common/thrift/ExecStats.thrift                           |  1 -
 .../tpch/queries/min-multiple-distinct-aggs.test         | 16 ++++++++++++++++
 tests/query_test/test_aggregation.py                     |  3 +++
 6 files changed, 26 insertions(+), 9 deletions(-)
 create mode 100644 testdata/workloads/tpch/queries/min-multiple-distinct-aggs.test

[impala] 01/02: IMPALA-9000: clean up misc TODO-MT comments

Posted by ta...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 69b0b15f59169520cdffdbc31f90a5a04aea79e0
Author: Tim Armstrong <ta...@cloudera.com>
AuthorDate: Mon Jun 1 14:58:06 2020 -0700

    IMPALA-9000: clean up misc TODO-MT comments
    
    Filed IMPALA-9812, IMPALA-9811 for misc cleanup tasks and
    IMPALA-9814 for the planner estimation issue in AnalyticPlanner.
    
    Change-Id: Ieeffebf6883427ca88640d4fe57159078516ba11
    Reviewed-on: http://gerrit.cloudera.org:8080/16023
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/runtime/fragment-instance-state.h | 4 +---
 bin/run-all-tests.sh                     | 3 +--
 common/thrift/ExecStats.thrift           | 1 -
 3 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/be/src/runtime/fragment-instance-state.h b/be/src/runtime/fragment-instance-state.h
index 3031174..0737050 100644
--- a/be/src/runtime/fragment-instance-state.h
+++ b/be/src/runtime/fragment-instance-state.h
@@ -249,8 +249,7 @@ class FragmentInstanceState {
   /// on a single host will have the same value for this counter.
   RuntimeProfile::Counter* per_host_mem_usage_ = nullptr;
 
-  /// Number of rows returned by this fragment
-  /// TODO: by this instance?
+  /// Number of rows returned by this fragment instance.
   RuntimeProfile::Counter* rows_produced_counter_ = nullptr;
 
   /// Average number of thread tokens for the duration of the fragment instance execution.
@@ -260,7 +259,6 @@ class FragmentInstanceState {
   /// additional tokens.
   /// This is a measure of how much CPU resources this instance used during the course
   /// of the execution.
-  /// TODO-MT: remove
   RuntimeProfile::Counter* avg_thread_tokens_ = nullptr;
 
   /// Sampled memory usage at even time intervals.
diff --git a/bin/run-all-tests.sh b/bin/run-all-tests.sh
index 77ce7c4..3a1f8b8 100755
--- a/bin/run-all-tests.sh
+++ b/bin/run-all-tests.sh
@@ -160,7 +160,7 @@ ulimit -c unlimited || true
 
 # Helper function to start Impala cluster.
 start_impala_cluster() {
-  # TODO-MT: remove --unlock_mt_dop when it is no longer needed.
+  # TODO: IMPALA-9812: remove --unlock_mt_dop when it is no longer needed.
   run-step "Starting Impala cluster" start-impala-cluster.log \
       "${IMPALA_HOME}/bin/start-impala-cluster.py" \
       --log_dir="${IMPALA_EE_TEST_LOGS_DIR}" \
@@ -225,7 +225,6 @@ do
         TEST_RET_CODE=1
       fi
       # Restart the minicluster after running the FE custom cluster tests.
-      # TODO-MT: remove --unlock_mt_dop when it is no longer needed.
       start_impala_cluster
     fi
     popd
diff --git a/common/thrift/ExecStats.thrift b/common/thrift/ExecStats.thrift
index d059d24..303d28c 100644
--- a/common/thrift/ExecStats.thrift
+++ b/common/thrift/ExecStats.thrift
@@ -47,7 +47,6 @@ struct TExecStats {
 
   // Total CPU time spent across all threads. For operators that have an async
   // component (e.g. multi-threaded) this will be >= latency_ns.
-  // TODO-MT: remove this or latency_ns
   2: optional i64 cpu_time_ns
 
   // Number of rows returned.

[impala] 02/02: IMPALA-9809: Multi-aggregation query on particular dataset crashes impalad

Posted by ta...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 37b5599a7a25536a6e5265788cf4e139db3b0cfa
Author: Yongzhi Chen <yc...@cloudera.com>
AuthorDate: Mon Jun 1 16:58:13 2020 -0400

    IMPALA-9809: Multi-aggregation query on particular dataset crashes impalad
    
    In streaming-aggregation-node.cc , when replicate_input_ is true
    and num_aggs > 1, it will call AddBatchStreaming several
    times(more than 1), each time, the out_batch will be used.
    If a row is not cached, the value will be saved in the out_batch,
    and out_batch's row count will be increased.
    The row_count did not set back to 0 when next while loop. Therefore
    in out_batch, it is possible that not all the tuples are non-null.
    (For example the rows added when agg_idx = 1, only tuple with 1 not
    null; the rows added when when agg_idx = 2, only tuple with 2 not
    null). But in grouping-aggregation-ir.cc, the serialize out code is
    start from very beginning of out_batch for a agg_idx, it has good
    chance to hit null tuple.
    
    Fix the issue by only serialize the tuples being added by
    current function call.
    
    Tests:
    Manual tests
    Unit tests
    
    Change-Id: I06d73171cdc40bdbb15960573030ac7fc94a7e16
    Reviewed-on: http://gerrit.cloudera.org:8080/16019
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/grouping-aggregator-ir.cc                    |  8 +++++---
 .../tpch/queries/min-multiple-distinct-aggs.test         | 16 ++++++++++++++++
 tests/query_test/test_aggregation.py                     |  3 +++
 3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/be/src/exec/grouping-aggregator-ir.cc b/be/src/exec/grouping-aggregator-ir.cc
index d3c7f1a..fb70e3b 100644
--- a/be/src/exec/grouping-aggregator-ir.cc
+++ b/be/src/exec/grouping-aggregator-ir.cc
@@ -157,8 +157,8 @@ Status GroupingAggregator::AddBatchStreamingImpl(int agg_idx, bool needs_seriali
     HashTableCtx* __restrict__ ht_ctx, int remaining_capacity[PARTITION_FANOUT]) {
   DCHECK(is_streaming_preagg_);
   DCHECK(!out_batch->AtCapacity());
-
-  RowBatch::Iterator out_batch_iterator(out_batch, out_batch->num_rows());
+  const int out_batch_start = out_batch->num_rows();
+  RowBatch::Iterator out_batch_iterator(out_batch, out_batch_start);
   HashTableCtx::ExprValuesCache* expr_vals_cache = ht_ctx->expr_values_cache();
   const int num_rows = in_batch->num_rows();
   const int cache_size = expr_vals_cache->capacity();
@@ -200,7 +200,9 @@ Status GroupingAggregator::AddBatchStreamingImpl(int agg_idx, bool needs_seriali
   streaming_idx_ = 0;
 ret:
   if (needs_serialize) {
-    FOREACH_ROW(out_batch, 0, out_batch_iter) {
+    // We only serialize the added rows in this call. The rows before out_batch_start may
+    // be added for other agg_idx values.
+    FOREACH_ROW(out_batch, out_batch_start, out_batch_iter) {
       AggFnEvaluator::Serialize(agg_fn_evals_, out_batch_iter.Get()->GetTuple(agg_idx));
     }
   }
diff --git a/testdata/workloads/tpch/queries/min-multiple-distinct-aggs.test b/testdata/workloads/tpch/queries/min-multiple-distinct-aggs.test
new file mode 100644
index 0000000..d3434a2
--- /dev/null
+++ b/testdata/workloads/tpch/queries/min-multiple-distinct-aggs.test
@@ -0,0 +1,16 @@
+====
+---- QUERY
+select
+l_orderkey,
+min(case when l_shipdate >'1996-03-30' then l_shipdate else null end) as flt,
+count(distinct case when l_linenumber >70 then l_partkey end) as cl,
+count(distinct l_partkey) as cnl
+from lineitem
+group by l_orderkey order by l_orderkey limit 2;
+---- TYPES
+BIGINT,STRING,BIGINT,BIGINT
+---- RESULTS
+1,'1996-04-12',0,6
+2,'1997-01-28',0,1
+====
+
diff --git a/tests/query_test/test_aggregation.py b/tests/query_test/test_aggregation.py
index 62e9ee7..10d63ec 100644
--- a/tests/query_test/test_aggregation.py
+++ b/tests/query_test/test_aggregation.py
@@ -435,3 +435,6 @@ class TestTPCHAggregationQueries(ImpalaTestSuite):
 
   def test_tpch_stress(self, vector):
     self.run_test_case('tpch-stress-aggregations', vector)
+
+  def test_min_multiple_distinct(self, vector, unique_database):
+    self.run_test_case('min-multiple-distinct-aggs', vector)