You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/02/03 11:25:52 UTC

[jira] [Resolved] (SPARK-19438) executorDataMap should be guarded by CoarseGrainedSchedulerBackend.this.synchronized

     [ https://issues.apache.org/jira/browse/SPARK-19438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen resolved SPARK-19438.
-------------------------------
    Resolution: Not A Problem

> executorDataMap should be guarded by CoarseGrainedSchedulerBackend.this.synchronized 
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-19438
>                 URL: https://issues.apache.org/jira/browse/SPARK-19438
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0
>            Reporter: jin xing
>
> Currently when handle *RegisterExecutor* in *CoarseGrainedSchedulerBackend*, *executorDataMap* is guarded by *CoarseGrainedSchedulerBackend.this.synchronized* when updating, which can cause *numPendingExecutors* incorrect. 
> Code is like below:
> {code}
>         if (executorDataMap.contains(executorId)) {
>           executorRef.send(RegisterExecutorFailed("Duplicate executor ID: " + executorId))
>           context.reply(true)
>         } else {
>           ...
>           CoarseGrainedSchedulerBackend.this.synchronized {
>             executorDataMap.put(executorId, data)
>             if (currentExecutorIdCounter < executorId.toInt) {
>               currentExecutorIdCounter = executorId.toInt
>             }
>             if (numPendingExecutors > 0) {
>               numPendingExecutors -= 1
>               logDebug(s"Decremented number of pending executors ($numPendingExecutors left)")
>             }
>           }
> {code}
> Consider SPARK-19437 and a scenario like below:
> An executor sent *RegisterExecutor* twice by *askWithRetry*, and the interval between the two is quite small. Thus it might be possible that both of them will go to *else* branch, thus *numPendingExecutors* will be deducted twice. Currently, the *askWithRetry* of *RegisterExecutor* only exists in some unit tests, but it makes sense to make it stronger when handling *RegisterExecutor*.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org