You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Ye Zhou (Jira)" <ji...@apache.org> on 2021/05/28 00:53:00 UTC

[jira] [Created] (SPARK-35546) Handling race condition and memory leak in RemoteBlockPushResolver

Ye Zhou created SPARK-35546:
-------------------------------

             Summary: Handling race condition and memory leak in RemoteBlockPushResolver
                 Key: SPARK-35546
                 URL: https://issues.apache.org/jira/browse/SPARK-35546
             Project: Spark
          Issue Type: Sub-task
          Components: Shuffle
    Affects Versions: 3.1.0
            Reporter: Ye Zhou


In the current implementation of RemoteBlockPushResolver, two ConcurrentHashmap are used to store #1 applicationId -> mergedShuffleLocalDirPath #2 applicationId+attemptId+shuffleID -> mergedShuffleParitionInfo. As there are four types of messages: ExecutorRegister, PushBlocks, FinalizeShuffleMerge and ApplicationRemove, will trigger different types of operations within these two hashmaps, it is required to maintain strong consistency about the informations stored in these two hashmaps. Otherwise, either there will be data corruption/correctness issues or memory leak in shuffle server. 

We should come up with systematic way to resolve this, other than spot fixing the potential issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org