You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by ya...@apache.org on 2019/03/27 01:42:55 UTC

[spark] branch master updated: [SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 3a8398d  [SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB
3a8398d is described below

commit 3a8398df5cf87f597e672bfbb8c6eadbad800d03
Author: Sean Owen <se...@databricks.com>
AuthorDate: Wed Mar 27 10:42:26 2019 +0900

    [SPARK-26660][FOLLOWUP] Raise task serialized size warning threshold to 1000 KiB
    
    ## What changes were proposed in this pull request?
    
    Raise the threshold size for serialized task size at which a warning is generated from 100KiB to 1000KiB.
    
    As several people have noted, the original change for this JIRA highlighted that this threshold is low. Test output regularly shows:
    
    ```
    - sorting on StringType with nullable=false, sortOrder=List('a DESC NULLS LAST)
    22:47:53.320 WARN org.apache.spark.scheduler.TaskSetManager: Stage 80 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
    22:47:53.348 WARN org.apache.spark.scheduler.TaskSetManager: Stage 81 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
    22:47:53.417 WARN org.apache.spark.scheduler.TaskSetManager: Stage 83 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
    22:47:53.444 WARN org.apache.spark.scheduler.TaskSetManager: Stage 84 contains a task of very large size (755 KiB). The maximum recommended task size is 100 KiB.
    
    ...
    
    - SPARK-20688: correctly check analysis for scalar sub-queries
    22:49:10.314 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.8 KiB
    - SPARK-21835: Join in correlated subquery should be duplicateResolved: case 1
    22:49:10.595 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB
    22:49:10.744 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB
    22:49:10.894 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 150.7 KiB
    - SPARK-21835: Join in correlated subquery should be duplicateResolved: case 2
    - SPARK-21835: Join in correlated subquery should be duplicateResolved: case 3
    - SPARK-23316: AnalysisException after max iteration reached for IN query
    22:49:11.559 WARN org.apache.spark.scheduler.DAGScheduler: Broadcasting large task binary with size 154.2 KiB
    ```
    
    It seems that a larger threshold of about 1MB is more suitable.
    
    ## How was this patch tested?
    
    Existing tests.
    
    Closes #24226 from srowen/SPARK-26660.2.
    
    Authored-by: Sean Owen <se...@databricks.com>
    Signed-off-by: Takeshi Yamamuro <ya...@apache.org>
---
 core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
index ea31fe8..3977c0b 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala
@@ -1111,5 +1111,5 @@ private[spark] class TaskSetManager(
 private[spark] object TaskSetManager {
   // The user will be warned if any stages contain a task that has a serialized size greater than
   // this.
-  val TASK_SIZE_TO_WARN_KIB = 100
+  val TASK_SIZE_TO_WARN_KIB = 1000
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org