You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2019/07/28 01:56:19 UTC
[spark] branch branch-2.4 updated: [SPARK-28545][SQL] Add the hash map size to the directional log of ObjectAggregationIterator

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new 8934560  [SPARK-28545][SQL] Add the hash map size to the directional log of ObjectAggregationIterator
8934560 is described below

commit 89345609b9ced9ad6ce164904a88cf92a7e8a05e
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Sat Jul 27 18:55:36 2019 -0700

    [SPARK-28545][SQL] Add the hash map size to the directional log of ObjectAggregationIterator
    
    ## What changes were proposed in this pull request?
    
    `ObjectAggregationIterator` shows a directional info message to increase `spark.sql.objectHashAggregate.sortBased.fallbackThreshold` when the size of the in-memory hash map grows too large and it falls back to sort-based aggregation.
    However, we don't know how much we need to increase. This PR adds the size of the current in-memory hash map size to the log message.
    
    **BEFORE**
    ```
    15:21:41.669 Executor task launch worker for task 0 INFO
    ObjectAggregationIterator: Aggregation hash map reaches threshold capacity (2 entries), ...
    ```
    
    **AFTER**
    ```
    15:20:05.742 Executor task launch worker for task 0 INFO
    ObjectAggregationIterator: Aggregation hash map size 2 reaches threshold capacity (2 entries), ...
    ```
    
    ## How was this patch tested?
    
    Manual. For example, run `ObjectHashAggregateSuite.scala`'s `typed_count fallback to sort-based aggregation` and search the above message in `target/unit-tests.log`.
    
    Closes #25276 from dongjoon-hyun/SPARK-28545.
    
    Authored-by: Dongjoon Hyun <dh...@apple.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
    (cherry picked from commit d943ee0a881540aa356cdce533b693baaf7c644f)
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .../spark/sql/execution/aggregate/ObjectAggregationIterator.scala     | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
index 43514f5..b88ddba 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
@@ -161,9 +161,9 @@ class ObjectAggregationIterator(
         // The the hash map gets too large, makes a sorted spill and clear the map.
         if (hashMap.size >= fallbackCountThreshold) {
           logInfo(
-            s"Aggregation hash map reaches threshold " +
+            s"Aggregation hash map size ${hashMap.size} reaches threshold " +
               s"capacity ($fallbackCountThreshold entries), spilling and falling back to sort" +
-              s" based aggregation. You may change the threshold by adjust option " +
+              " based aggregation. You may change the threshold by adjust option " +
               SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key
           )
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org