You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2019/07/28 01:56:19 UTC
[spark] branch branch-2.4 updated: [SPARK-28545][SQL] Add the hash
map size to the directional log of ObjectAggregationIterator
This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-2.4 by this push:
new 8934560 [SPARK-28545][SQL] Add the hash map size to the directional log of ObjectAggregationIterator
8934560 is described below
commit 89345609b9ced9ad6ce164904a88cf92a7e8a05e
Author: Dongjoon Hyun <dh...@apple.com>
AuthorDate: Sat Jul 27 18:55:36 2019 -0700
[SPARK-28545][SQL] Add the hash map size to the directional log of ObjectAggregationIterator
## What changes were proposed in this pull request?
`ObjectAggregationIterator` shows a directional info message to increase `spark.sql.objectHashAggregate.sortBased.fallbackThreshold` when the size of the in-memory hash map grows too large and it falls back to sort-based aggregation.
However, we don't know how much we need to increase. This PR adds the size of the current in-memory hash map size to the log message.
**BEFORE**
```
15:21:41.669 Executor task launch worker for task 0 INFO
ObjectAggregationIterator: Aggregation hash map reaches threshold capacity (2 entries), ...
```
**AFTER**
```
15:20:05.742 Executor task launch worker for task 0 INFO
ObjectAggregationIterator: Aggregation hash map size 2 reaches threshold capacity (2 entries), ...
```
## How was this patch tested?
Manual. For example, run `ObjectHashAggregateSuite.scala`'s `typed_count fallback to sort-based aggregation` and search the above message in `target/unit-tests.log`.
Closes #25276 from dongjoon-hyun/SPARK-28545.
Authored-by: Dongjoon Hyun <dh...@apple.com>
Signed-off-by: Dongjoon Hyun <dh...@apple.com>
(cherry picked from commit d943ee0a881540aa356cdce533b693baaf7c644f)
Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
.../spark/sql/execution/aggregate/ObjectAggregationIterator.scala | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
index 43514f5..b88ddba 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/ObjectAggregationIterator.scala
@@ -161,9 +161,9 @@ class ObjectAggregationIterator(
// The the hash map gets too large, makes a sorted spill and clear the map.
if (hashMap.size >= fallbackCountThreshold) {
logInfo(
- s"Aggregation hash map reaches threshold " +
+ s"Aggregation hash map size ${hashMap.size} reaches threshold " +
s"capacity ($fallbackCountThreshold entries), spilling and falling back to sort" +
- s" based aggregation. You may change the threshold by adjust option " +
+ " based aggregation. You may change the threshold by adjust option " +
SQLConf.OBJECT_AGG_SORT_BASED_FALLBACK_THRESHOLD.key
)
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org