You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ya...@apache.org on 2019/03/27 01:42:55 UTC
[spark] branch master updated: [SPARK-26660][FOLLOWUP] Raise task
serialized size warning threshold to 1000 KiB
This is an automated email from the ASF dual-hosted git repository.
yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 3a8398d [SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB
3a8398d is described below
commit 3a8398df5cf87f597e672bfbb8c6eadbad800d03
Author: Sean Owen <se...@databricks.com>
AuthorDate: Wed Mar 27 10:42:26 2019 +0900
[SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB
## What changes were proposed in this pull request?
Raise the threshold size for serialized task size at which a warning is generated from 100KiB to 1000KiB.
As several people have noted, the original change for this JIRA highlighted that this threshold is low. Test output regularly shows:
```
- sorting on StringType with nullable=false, sortOrder=List('a DESC NULLS LAST)
22:47:53.320 WARN org.apache.spark.scheduler.TaskSetManager: Stage 80 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
22:47:53.348 WARN org.apache.spark.scheduler.TaskSetManager: Stage 81 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
22:47:53.417 WARN org.apache.spark.scheduler.TaskSetManager: Stage 83 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
22:47:53.444 WARN org.apache.spark.scheduler.TaskSetManager: Stage 84 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
...
- SPARK-20688: correctly check analysis for scalar sub-queries
22:49:10.314 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.8 KiB
- SPARK-21835: Join in correlated subquery should be duplicateResolved: case 1
22:49:10.595 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB
22:49:10.744 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB
22:49:10.894 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB
- SPARK-21835: Join in correlated subquery should be duplicateResolved: case 2
- SPARK-21835: Join in correlated subquery should be duplicateResolved: case 3
- SPARK-23316: AnalysisException after max iteration reached for IN query
22:49:11.559 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 154.2 KiB
```
It seems that a larger threshold of about 1MB is more suitable.
## How was this patch tested?
Existing tests.
Closes #24226 from srowen/SPARK-26660.2.
Authored-by: Sean Owen <se...@databricks.com>
Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
---
core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
index ea31fe8..3977c0b 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
@@ -1111,5 +1111,5 @@ private[spark] class TaskSetManager(
private[spark] object TaskSetManager {
// The user will be warned if any stages contain a task that has a serialized size greater than
// this.
- val TASK_SIZE_TO_WARN_KIB = 100
+ val TASK_SIZE_TO_WARN_KIB = 1000
}
---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org