You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jerryshao <gi...@git.apache.org> on 2015/11/13 06:59:26 UTC

[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killin...

GitHub user jerryshao opened a pull request:

    https://github.com/apache/spark/pull/9684

    [SPARK-11718][Yarn][Core]Fix explicitly killing executor dies silently issue

    Currently if dynamic allocation is enabled, explicitly killing executor will not get response, so the executor metadata is wrong in driver side. Which will make dynamic allocation on Yarn fail to work.
    
    The problem is  `disableExecutor` returns false for pending killing executors when `onDisconnect` is detected, so no further implementation is done.
    
    One solution is to bypass these explicitly killed executors to use `super.onDisconnect` to remove executor. This is simple.
    
    Another solution is still querying the loss reason for these explicitly kill executors. Since executor may get killed and informed in the same AM-RM communication, so current way of adding pending loss reason request is not worked (container complete is already processed), here we should store this loss reason for later query.
    
    Here this PR chooses solution 2.
    
    Please help to review. @vanzin I think this part is changed by you previously, would you please help to review? Thanks a lot.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jerryshao/apache-spark SPARK-11718

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9684.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9684
    
----
commit 8719ee480fe00ecaa85f3f08c8a8bc578a226bcc
Author: jerryshao <ss...@hortonworks.com>
Date:   2015-11-13T05:47:46Z

    Fix explicitly killing executor dies silently issue

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156793955
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156811167
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45948/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156332450
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156358866
  
    **[Test build #45846 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45846/consoleFull)** for PR 9684 at commit [`85d18d2`](https://github.com/apache/spark/commit/85d18d2dcec4d6d46bddd7b4bf6ad264e40d802a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156604665
  
    Just minor things, otherwise LGTM.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killin...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9684#discussion_r44753574
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -538,6 +549,11 @@ private[yarn] class YarnAllocator(
         if (executorIdToContainer.contains(eid)) {
           pendingLossReasonRequests
             .getOrElseUpdate(eid, new ArrayBuffer[RpcCallContext]) += context
    +    } else if (releasedExecutorLossReasons.get(eid) != null) {
    +      // Executor is already released explicitly before getting the loss reason, so directly send
    +      // the pre-stored lost reason
    +      context.reply(releasedExecutorLossReasons.get(eid))
    +      releasedExecutorLossReasons.remove(eid)
         } else {
           logWarning(s"Tried to get the loss reason for non-existent executor $eid")
    --- End diff --
    
    Note: here it would also be better to response a message, otherwise driver will pending on waiting the message until timeout. This may happens when issue a loss reason query, AM already processed this completed containers (since they asynchronously).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156380975
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156394274
  
    **[Test build #45848 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45848/consoleFull)** for PR 9684 at commit [`0ae6613`](https://github.com/apache/spark/commit/0ae6613af3cbd435559d871792ebfdf1ba867cf3).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156380977
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45846/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156380863
  
    **[Test build #45846 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45846/consoleFull)** for PR 9684 at commit [`85d18d2`](https://github.com/apache/spark/commit/85d18d2dcec4d6d46bddd7b4bf6ad264e40d802a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156600658
  
    Hi @vanzin , I have no strong inclination on either solution, I can change to the first solution.
    
    The problem here with first solution is that the executor loss reason is `SlaveLost`, not actually `ExecutorExited`, so it is not well informed to the user, of course I could change the loss reason to make it more reasonable.
    
    Besides, still I think there's a race condition between querying loss reason and process completed containers, since they're in two threads, here we assume processing completed containers should be later than querying loss reason, but I don't think we could guarantee this. Maybe we could receive `onDisconnected` later than processing completed containers. I think it is possible, and in such case we're failed to get loss reason, so here in this PR I changed to store loss reason if processing completed containers is ahead of querying, unless we could accept failing to get loss reason. What's your opinion?
    
    Thanks a lot.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156357307
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156394377
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45848/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156362458
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156357018
  
    **[Test build #45837 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45837/consoleFull)** for PR 9684 at commit [`85d18d2`](https://github.com/apache/spark/commit/85d18d2dcec4d6d46bddd7b4bf6ad264e40d802a).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9684#discussion_r44851850
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -96,6 +96,11 @@ private[yarn] class YarnAllocator(
       // was lost.
       private val pendingLossReasonRequests = new HashMap[String, mutable.Buffer[RpcCallContext]]
     
    +  // Executor loss reason for explicitly released executors, it will be added when executor loss
    --- End diff --
    
    Given the race you described, this is not just for explicitly released executors, but for any executor whose loss reason hasn't been queried by the driver yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156394373
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156603725
  
    > Besides, still I think there's a race condition between querying loss reason and process completed containers, 
    
    That's true. It's probably unlikely that it would be hit (since the "executor disconnects, ask for loss reason" path is much quicker than the "nm sees process has died, nm notifies rm, spark am wakes up from sleep and sends heartbeat to rm and sees container has exited" path), But better be safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by jerryshao <gi...@git.apache.org>.
Github user jerryshao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9684#discussion_r44851960
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -96,6 +96,11 @@ private[yarn] class YarnAllocator(
       // was lost.
       private val pendingLossReasonRequests = new HashMap[String, mutable.Buffer[RpcCallContext]]
     
    +  // Executor loss reason for explicitly released executors, it will be added when executor loss
    --- End diff --
    
    Yes, the true. But at the initial test in my local environment, explicitly release executors will easily hit this issue. So I just commented like this, I will change it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156802883
  
    **[Test build #45948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45948/consoleFull)** for PR 9684 at commit [`5347734`](https://github.com/apache/spark/commit/53477342d9fadc9f1da365805b81688f9b8ee5bd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9684#discussion_r44851653
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala ---
    @@ -269,13 +269,16 @@ class CoarseGrainedSchedulerBackend(scheduler: TaskSchedulerImpl, val rpcEnv: Rp
          * Stop making resource offers for the given executor. The executor is marked as lost with
          * the loss reason still pending.
          *
    -     * @return Whether executor was alive.
    +     * @return Whether executor should be disabled
          */
         protected def disableExecutor(executorId: String): Boolean = {
           val shouldDisable = CoarseGrainedSchedulerBackend.this.synchronized {
             if (executorIsAlive(executorId)) {
               executorsPendingLossReason += executorId
               true
    +        } else if (executorsPendingToRemove.contains(executorId)) {
    --- End diff --
    
    this could just be:
    
        } else {
          executorsPendingToRemove.contains(executorId)
        }



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156362681
  
    **[Test build #45848 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45848/consoleFull)** for PR 9684 at commit [`0ae6613`](https://github.com/apache/spark/commit/0ae6613af3cbd435559d871792ebfdf1ba867cf3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9684#discussion_r44851898
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -538,6 +551,11 @@ private[yarn] class YarnAllocator(
         if (executorIdToContainer.contains(eid)) {
           pendingLossReasonRequests
             .getOrElseUpdate(eid, new ArrayBuffer[RpcCallContext]) += context
    +    } else if (releasedExecutorLossReasons.get(eid) != null) {
    +      // Executor is already released explicitly before getting the loss reason, so directly send
    +      // the pre-stored lost reason
    +      context.reply(releasedExecutorLossReasons.get(eid))
    --- End diff --
    
    You could just use `remove(eid)` here, then you don't need the next line.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156357077
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45837/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156357076
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156357690
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156362475
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156525606
  
    > One solution is to bypass these explicitly killed executors to use super.onDisconnect to remove executor. This is simple.
    
    Is there any reason why that solution wouldn't work? It feels unnecessary to ask for the loss reason of an executor you asked to be killed; it would also simplify the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156357699
  
    Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156332437
  
     Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156811116
  
    **[Test build #45948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45948/consoleFull)** for PR 9684 at commit [`5347734`](https://github.com/apache/spark/commit/53477342d9fadc9f1da365805b81688f9b8ee5bd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9684#issuecomment-156335592
  
    **[Test build #45837 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45837/consoleFull)** for PR 9684 at commit [`85d18d2`](https://github.com/apache/spark/commit/85d18d2dcec4d6d46bddd7b4bf6ad264e40d802a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11718][Yarn][Core]Fix explicitly killed...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9684#discussion_r44819952
  
    --- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala ---
    @@ -538,6 +549,11 @@ private[yarn] class YarnAllocator(
         if (executorIdToContainer.contains(eid)) {
           pendingLossReasonRequests
             .getOrElseUpdate(eid, new ArrayBuffer[RpcCallContext]) += context
    +    } else if (releasedExecutorLossReasons.get(eid) != null) {
    +      // Executor is already released explicitly before getting the loss reason, so directly send
    +      // the pre-stored lost reason
    +      context.reply(releasedExecutorLossReasons.get(eid))
    +      releasedExecutorLossReasons.remove(eid)
         } else {
           logWarning(s"Tried to get the loss reason for non-existent executor $eid")
    --- End diff --
    
    yes, probably good to call `context.sendFailure` here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org