You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "L. C. Hsieh (Jira)" <ji...@apache.org> on 2019/10/04 03:49:00 UTC
[jira] [Resolved] (SPARK-29351) Avoid full synchronization in
ShuffleMapStage
[ https://issues.apache.org/jira/browse/SPARK-29351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
L. C. Hsieh resolved SPARK-29351.
---------------------------------
Resolution: Resolved
> Avoid full synchronization in ShuffleMapStage
> ---------------------------------------------
>
> Key: SPARK-29351
> URL: https://issues.apache.org/jira/browse/SPARK-29351
> Project: Spark
> Issue Type: New Feature
> Components: Spark Core
> Affects Versions: 2.4.4
> Environment: #
> Reporter: DB Tsai
> Assignee: DB Tsai
> Priority: Major
> Fix For: 3.0.0
>
>
> In one of our production streaming jobs that has more than 1k executors, and each has 20 cores, Spark spends significant portion of time (30s) in sending out the `ShuffeStatus`. We find there are two issues.
> # In driver's message loop, it's calling `serializedMapStatus` which is in sync block. When the job scales really big, it can cause the contention.
> # When the job is big, the `MapStatus` is huge as well, the serialization time and compression time is slow.
> This work aims to address the first problem.
>
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org