You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2019/12/22 15:59:44 UTC

[GitHub] [spark] Ngone51 opened a new pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Ngone51 opened a new pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980
 
 
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   Remove it from `CoarseGrainedSchedulerBackend` when `HeartbeatReceiver` recognizes a lost executor. 
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   Currently, an application may hang if we don't remove a lost executor from `CoarseGrainedSchedulerBackend` as it may happens due to:
   
   1) In `expireDeadHosts()`, `HeartbeatReceiver` calls `scheduler.executorLost()`;
   
   2) Before `HeartbeatReceiver` calls `sc.killAndReplaceExecutor()`(which would mark the lost executor as "pendingToRemove") in a separate thread,  `CoarseGrainedSchedulerBackend` may begins to launch tasks on that executor without realizing it has been lost indeed.
   
   3) If that lost executor doesn't shut down gracefully, `CoarseGrainedSchedulerBackend ` may never receive a disconnect event. As a result, tasks launched on that lost executor become orphans. While at the same time, driver just thinks that those tasks are still running and waits forever.
   
   Removing the lost executor from `CoarseGrainedSchedulerBackend` would let `TaskSetManager` mark those tasks as failed which avoids app hang.
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If no, write 'No'.
   -->
   
   No
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   
   Updated existed tests.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360857961
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   shall we still do this for non-`CoarseGrainedSchedulerBackend`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568687527
 
 
   **[Test build #115699 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115699/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568695330
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361243807
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala
 ##########
 @@ -153,7 +154,6 @@ class HeartbeatReceiverSuite
     heartbeatReceiverClock.advance(executorTimeout)
     heartbeatReceiverRef.askSync[Boolean](ExpireDeadHosts)
     // Only the second executor should be expired as a dead host
-    verify(scheduler).executorLost(meq(executorId2), any())
 
 Review comment:
   Remove this because we no longer invoke `scheduler.executorLost` in `HeartbeatReceiver`'s main thread. And asserts below have already proved that `executorId2` has been expired. So, I think it should be find to remove.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646643
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20483/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568765065
 
 
   **[Test build #115743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115743/testReport)** for PR 26980 at commit [`85eede8`](https://github.com/apache/spark/commit/85eede879764774494673ebaeb3dfd8200802c02).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360778246
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   > In case of the executors which are not gracefully shut down
   
   Do you mean `CoarseGrainedSchedulerBackend` is the only one that may shut down executors gracefully? And if it shuts down gracefully, is it OK to have 2 `RemoveExecutor` events?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568275384
 
 
   cc @xuanyuanking @cloud-fan 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568732307
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568694966
 
 
   **[Test build #115717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115717/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569179533
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216390
 
 
   **[Test build #115844 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115844/testReport)** for PR 26980 at commit [`e124720`](https://github.com/apache/spark/commit/e1247209fffb71a29610db157e9c673687e5c032).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569201596
 
 
   **[Test build #115844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115844/testReport)** for PR 26980 at commit [`e124720`](https://github.com/apache/spark/commit/e1247209fffb71a29610db157e9c673687e5c032).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569214248
 
 
   **[Test build #115841 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115841/testReport)** for PR 26980 at commit [`662bc50`](https://github.com/apache/spark/commit/662bc5060b2fc892dd026cfa9af62a95e762fe73).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568561668
 
 
   Is it possible to add a unit test for this case?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216467
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568732312
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115717/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568601236
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568575548
 
 
   **[Test build #115647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115647/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568667354
 
 
   **[Test build #115687 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115687/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568732312
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115717/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360777925
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   what about other `SchedulerBackend`? We only send `RemoveExecutor` event for `CoarseGrainedSchedulerBackend`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568622440
 
 
   Maybe I'm missing something, but why do we still allow schedule on the lost executor, instead of mark it as removed immediately?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568642122
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115684/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360819689
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   Yeah, only.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646447
 
 
   **[Test build #115687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115687/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568642116
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569250893
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115853/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569187757
 
 
   **[Test build #115841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115841/testReport)** for PR 26980 at commit [`662bc50`](https://github.com/apache/spark/commit/662bc5060b2fc892dd026cfa9af62a95e762fe73).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360782760
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   Hmm...This may can be a problem...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360791173
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   `SchedulerBackend` is private so we don't need to worry about plugins.
   
   Let me ask the question again: is `CoarseGrainedSchedulerBackend` the only one that may shut down executors non-gracefully? According to what you said, it covers all the scheduler backend so the answer is yes?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569206218
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20638/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569187757
 
 
   **[Test build #115841 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115841/testReport)** for PR 26980 at commit [`662bc50`](https://github.com/apache/spark/commit/662bc5060b2fc892dd026cfa9af62a95e762fe73).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568641657
 
 
   **[Test build #115684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115684/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] joshrosen-stripe commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
joshrosen-stripe commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361039495
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -201,12 +203,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
         scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
           s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
+            // lost executors from CoarseGrainedSchedulerBackend manually here (SPARK-27348)
+            sc.schedulerBackend match {
 
 Review comment:
   I think that causes a loop / cycle because `CoarseGrainedSchedulerBackend.removeExecutor` directly calls `TaskSchedulerImpl.executorLost`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568824833
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569224984
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20646/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361587759
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +202,28 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
 
 Review comment:
   I'd just mention we shall explicitly remove the information of the killed executors from scheduler backend instead of waiting for Disconnect message, to avoid task hang. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568640529
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569250893
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115853/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569179533
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569224770
 
 
   **[Test build #115853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115853/testReport)** for PR 26980 at commit [`5f2f86a`](https://github.com/apache/spark/commit/5f2f86ae4d1f44a12088f7f7a4cb43e9a1189837).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568694513
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568675032
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] seayoun commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
seayoun commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360815277
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   The executor may lost in network route, `CoarseGrainedSchedulerBackend` can't handle disconnect event, it happened in our production environment, too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568687615
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115699/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569179535
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20629/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568576231
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568778197
 
 
   **[Test build #115743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115743/testReport)** for PR 26980 at commit [`85eede8`](https://github.com/apache/spark/commit/85eede879764774494673ebaeb3dfd8200802c02).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216389
 
 
   **[Test build #115845 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115845/testReport)** for PR 26980 at commit [`5f2f86a`](https://github.com/apache/spark/commit/5f2f86ae4d1f44a12088f7f7a4cb43e9a1189837).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569224978
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568575548
 
 
   **[Test build #115647 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115647/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568687615
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115699/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568677050
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569200409
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20637/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361348573
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
 
 Review comment:
   No serious hurt, but gives waring/error says that the executor has already been removed, etc.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568695330
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568687608
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569179292
 
 
   **[Test build #115836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115836/testReport)** for PR 26980 at commit [`46da2bc`](https://github.com/apache/spark/commit/46da2bc8c95b103c15e5d24236f8e1ba54e0a606).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569214561
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115841/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568642122
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115684/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216518
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569224978
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] seayoun commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
seayoun commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360834151
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   I agaree with this, I try to fix it here https://github.com/apache/spark/pull/26938, but it seems not very well, this patch is good to me.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568576240
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20446/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568677053
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20494/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569203032
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569224770
 
 
   **[Test build #115853 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115853/testReport)** for PR 26980 at commit [`5f2f86a`](https://github.com/apache/spark/commit/5f2f86ae4d1f44a12088f7f7a4cb43e9a1189837).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569201596
 
 
   **[Test build #115844 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115844/testReport)** for PR 26980 at commit [`e124720`](https://github.com/apache/spark/commit/e1247209fffb71a29610db157e9c673687e5c032).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360817535
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   what if we don't remove it?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216471
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115845/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568778301
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115743/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568624123
 
 
   Does it hurt anything if we send the `RemoveExecutor` message in the main thread? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569206213
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360796114
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   No, no.
   
   Please pay attention to the comment above and note that, here, we're not saying that this issue happens due to `CoarseGrainedSchedulerBackend` may shut down executors non-gracefully. But we say that it may happens due to the executor itself is not gracefully shut down.
   
   It's possible that `CoarseGrainedSchedulerBackend` asks someone executor to shutdown and the executor may shutdown  non-gracefully. And it's also possible that the host where the executor located suddenly crushed that makes the executor shutdown non-gracefully. 
   
   Shutdown non-gracefully here only means that an executor shutdown but `CoarseGrainedSchedulerBackend` doesn't receive disconnect event from it. It really doesn't matter what causes the executor shutdown non-gracefully.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568601241
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115647/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569179292
 
 
   **[Test build #115836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115836/testReport)** for PR 26980 at commit [`46da2bc`](https://github.com/apache/spark/commit/46da2bc8c95b103c15e5d24236f8e1ba54e0a606).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569203032
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216518
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568824839
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20550/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361571352
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
 
 Review comment:
   makes sense!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568601236
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] joshrosen-stripe commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
joshrosen-stripe commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361039495
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -201,12 +203,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
         scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
           s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
+            // lost executors from CoarseGrainedSchedulerBackend manually here (SPARK-27348)
+            sc.schedulerBackend match {
 
 Review comment:
   I think that causes a loop / cycle because `CoarseGrainedSchedulerBackend.removeExecutor` already calls `TaskSchedulerImpl.executorLost`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568687608
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360790014
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   Match `CoarseGrainedSchedulerBackend` covers all scheduler backend(Standalone, YARN, K8S, Mesos) that officially supported by Spark and we can guarantee that this issue can be resolved internally. But for other plugged scheduler backend, Spark seems has nothing to do with current `SchedulerBackend` interface at least for this issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568765275
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20537/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568640367
 
 
   Jenkins, retest this please.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360796532
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   do you mean only `CoarseGrainedSchedulerBackend` can't handle non-graceful executor shutdown well?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568765268
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360817459
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   only means we only need to match `CoarseGrainedSchedulerBackend` here, nothing else.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361574984
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +202,27 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
+            // we should remove lost executors from CoarseGrainedSchedulerBackend manually
+            // here to guarantee two things:
+            // 1) call scheduler.executorLost() underlying to fail any tasks assigned to
+            //    those executors to avoid app hang
+            // 2) always remove executor information from CoarseGrainedSchedulerBackend for
+            //    a lost executor
+            sc.schedulerBackend match {
+              case backend: CoarseGrainedSchedulerBackend =>
+                backend.driverEndpoint.send(RemoveExecutor(executorId,
+                  SlaveLost(s"Executor heartbeat timed out after ${now - lastSeenMs} ms")))
+              case _: LocalSchedulerBackend =>
+                  // LocalSchedulerBackend is used locally and only has one single executor
+              case _ =>
 
 Review comment:
   let's throw an exception here.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568824653
 
 
   **[Test build #115756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115756/testReport)** for PR 26980 at commit [`e6350bf`](https://github.com/apache/spark/commit/e6350bf03deaaef992c855a0fa1fb161778e341c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568667509
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360819557
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   If we fail to remove it from `CoarseGrainedSchedulerBackend`, then, it means the executor has been removed previously(e.g. `onDisconnected()`). And `scheduler.executorLost()` must has been called, too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568667512
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115687/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569179535
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20629/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646198
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568601241
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115647/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216521
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115844/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361592123
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +202,28 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
+            // we should remove lost executors from CoarseGrainedSchedulerBackend manually
+            // here to guarantee two things:
+            // 1) explicitly remove executor information from CoarseGrainedSchedulerBackend for
+            //    a lost executor instead of waiting for disconnect message
+            // 2) call scheduler.executorLost() underlying to fail any tasks assigned to
+            //    those executors to avoid app hang
+            sc.schedulerBackend match {
+              case backend: CoarseGrainedSchedulerBackend =>
+                backend.driverEndpoint.send(RemoveExecutor(executorId,
+                  SlaveLost(s"Executor heartbeat timed out after ${now - lastSeenMs} ms")))
+              case _: LocalSchedulerBackend =>
+                  // LocalSchedulerBackend is used locally and only has one single executor
 
 Review comment:
   the comment is in a wrong place.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568839216
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115756/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646635
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 edited a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 edited a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568275384
 
 
   cc @xuanyuanking @cloud-fan @jiangxb1987 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568824653
 
 
   **[Test build #115756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115756/testReport)** for PR 26980 at commit [`e6350bf`](https://github.com/apache/spark/commit/e6350bf03deaaef992c855a0fa1fb161778e341c).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569223512
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568839215
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569214554
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569250593
 
 
   **[Test build #115853 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115853/testReport)** for PR 26980 at commit [`5f2f86a`](https://github.com/apache/spark/commit/5f2f86ae4d1f44a12088f7f7a4cb43e9a1189837).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646447
 
 
   **[Test build #115687 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115687/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646635
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361243811
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala
 ##########
 @@ -153,7 +154,6 @@ class HeartbeatReceiverSuite
     heartbeatReceiverClock.advance(executorTimeout)
     heartbeatReceiverRef.askSync[Boolean](ExpireDeadHosts)
     // Only the second executor should be expired as a dead host
-    verify(scheduler).executorLost(meq(executorId2), any())
 
 Review comment:
   Remove this because we no longer invoke `scheduler.executorLost` in `HeartbeatReceiver`'s main thread. And asserts below have already proved that `executorId2` has been expired. So, I think it should be find to remove.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568765065
 
 
   **[Test build #115743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115743/testReport)** for PR 26980 at commit [`85eede8`](https://github.com/apache/spark/commit/85eede879764774494673ebaeb3dfd8200802c02).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568778301
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115743/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568677050
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568641657
 
 
   **[Test build #115684 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115684/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569579741
 
 
   thanks, merging to master!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569202745
 
 
   **[Test build #115836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115836/testReport)** for PR 26980 at commit [`46da2bc`](https://github.com/apache/spark/commit/46da2bc8c95b103c15e5d24236f8e1ba54e0a606).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568824839
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20550/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360778246
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   > In case of the executors which are not gracefully shut down
   
   Do you mean `CoarseGrainedSchedulerBackend` is the only one that may shut down executors non-gracefully? And if it shuts down gracefully, is it OK to have 2 `RemoveExecutor` events?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568667512
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115687/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569200409
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20637/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568732307
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568642116
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568676649
 
 
   **[Test build #115699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115699/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361269391
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
+            // we should remove lost executors from CoarseGrainedSchedulerBackend manually
+            // here to guarantee two things:
+            // 1) call scheduler.executorLost() underlying to fail any tasks assigned to
 
 Review comment:
   Will we call `scheduler.executorLost` for `LocalSchedulerBackend`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568646643
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20483/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568600962
 
 
   **[Test build #115647 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115647/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).
    * This patch **fails PySpark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361243811
 
 

 ##########
 File path: core/src/test/scala/org/apache/spark/HeartbeatReceiverSuite.scala
 ##########
 @@ -153,7 +154,6 @@ class HeartbeatReceiverSuite
     heartbeatReceiverClock.advance(executorTimeout)
     heartbeatReceiverRef.askSync[Boolean](ExpireDeadHosts)
     // Only the second executor should be expired as a dead host
-    verify(scheduler).executorLost(meq(executorId2), any())
 
 Review comment:
   Remove this because we no longer invoke `scheduler.executorLost` in `HeartbeatReceiver`'s main thread. And asserts below have already proved that `executorId2` has been expired. So, I think it should be fine to remove.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361587855
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +202,28 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
+            // we should remove lost executors from CoarseGrainedSchedulerBackend manually
+            // here to guarantee two things:
+            // 1) call scheduler.executorLost() underlying to fail any tasks assigned to
+            //    those executors to avoid app hang
+            // 2) always remove executor information from CoarseGrainedSchedulerBackend for
+            //    a lost executor
+            sc.schedulerBackend match {
+              case backend: CoarseGrainedSchedulerBackend =>
+                backend.driverEndpoint.send(RemoveExecutor(executorId,
+                  SlaveLost(s"Executor heartbeat timed out after ${now - lastSeenMs} ms")))
+              case _: LocalSchedulerBackend =>
+                  // LocalSchedulerBackend is used locally and only has one single executor
+              case _ => throw new UnsupportedOperationException(
 
 Review comment:
   nit: 
   ```
   case other => throw new UnsupportedOperationException(s"Unknown scheduler backend: ${other.getClass})
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568640278
 
 
   > Is it possible to add a unit test for this case?
   
   It's kind of difficult...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568765275
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20537/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569200405
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569187969
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20634/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569250888
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569187966
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216471
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115845/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568695335
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20511/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569203034
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115836/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568640529
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216467
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360787039
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   then why do we only match `CoarseGrainedSchedulerBackend` here?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] seayoun commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
seayoun commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360815277
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   The executor may lost in network route, `CoarseGrainedSchedulerBackend` can't handle disconnect event, it happened in our production environment, too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360782486
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   > Do you mean CoarseGrainedSchedulerBackend is the only one that may shut down executors non-gracefully? 
   
   No. IIUC, it's executor's own behavior but not related to scheduler backend. An executor can be self-lost(no heartbeat) but still alive, e.g. due to network problem.
   
   > And if it shuts down gracefully, is it OK to have 2 RemoveExecutor events?
   
   Yes, we have protection for this case.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360858546
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   I think this is needed for non-`TaskSchedulerImpl` ?
   
   This can be a behavior change for other TaskScheduler, I'll revert this later.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568677053
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20494/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361587027
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
 
 Review comment:
   The problem is we should avoid call TaskScheduler directly here, instead we shall shadow it behind the SchedulerBackend, to make resource information tracking  easier.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569206213
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568642110
 
 
   **[Test build #115684 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115684/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).
    * This patch **fails build dependency tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568839215
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568640533
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20480/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569214561
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115841/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361269779
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
 
 Review comment:
   does it hurt to call `scheduler.executorLost` twice? Since we assume killing executors may take a while, looks reasonable to call `scheduler.executorLost` and re-submit tasks eariler.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568731943
 
 
   **[Test build #115717 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115717/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568765268
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568778299
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568839216
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115756/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568676649
 
 
   **[Test build #115699 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115699/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568576240
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20446/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569205933
 
 
   **[Test build #115845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115845/testReport)** for PR 26980 at commit [`5f2f86a`](https://github.com/apache/spark/commit/5f2f86ae4d1f44a12088f7f7a4cb43e9a1189837).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568576231
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568695335
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20511/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569200405
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568632758
 
 
   > Does it hurt anything if we send the `RemoveExecutor` message in the main thread?
   
   The problem is that we must send `RemoveExecutor` message after `sc.killAndReplaceExecutor(executorId)` in order to make sure that the executor has already been marked as "pendingToRemove"(in `sc.killAndReplaceExecutor`). Sending `RemoveExecutor` message in the main thread doesn't guarantee that and the issue could still exist.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569206218
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20638/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568622440
 
 
   Maybe I'm missing something, but why do we still allow schedule on the lost executor, instead of mark it as removed immediately?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361040368
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -201,12 +203,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
         scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
           s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
+            // lost executors from CoarseGrainedSchedulerBackend manually here (SPARK-27348)
+            sc.schedulerBackend match {
 
 Review comment:
   ah, i see

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569224984
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20646/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569205933
 
 
   **[Test build #115845 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115845/testReport)** for PR 26980 at commit [`5f2f86a`](https://github.com/apache/spark/commit/5f2f86ae4d1f44a12088f7f7a4cb43e9a1189837).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569187966
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360809654
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
 
 Review comment:
   Yeah, in this case, `CoarseGrainedSchedulerBackend` can't handle non-graceful executor shutdown well. But I am not sure what do you mean by "only", here? What other factors do you also care about?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568778299
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361350788
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
 
 Review comment:
   > Since we assume killing executors may take a while, looks reasonable to call scheduler.executorLost and re-submit tasks earlier.
   
   I noticed a wired problem of this. That is when we call `scheduler.executorLost` in `HeartbeatReceiver`, `scheduler` knows which executor lost, but `CoarseGrainedSchedulerBackend` doesn't. So, even if we want to re-submit tasks earlier, tasks can still be launched on that lost executor.
   
   But for other two places where we call `scheduler.executorLost`, it guarantees that `CoarseGrainedSchedulerBackend` has realized about the lost/killed executor.
   
   So, I think that `HeartbeatReceiver` may has used it incorrectly, here.
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569250888
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
jiangxb1987 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360982304
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -201,12 +203,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
         scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
           s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // In case of the executors which are not gracefully shut down, we should remove
+            // lost executors from CoarseGrainedSchedulerBackend manually here (SPARK-27348)
+            sc.schedulerBackend match {
 
 Review comment:
   Why not put this into TaskSchedulerImpl.executorLost() ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
Ngone51 commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r360714352
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,20 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
 
 Review comment:
   Since remove executor in `CoarseGrainedSchedulerBackend` will always do `scheduler.executorLost()`, so it's no longer needed here. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568824833
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569216521
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115844/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568838944
 
 
   **[Test build #115756 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115756/testReport)** for PR 26980 at commit [`e6350bf`](https://github.com/apache/spark/commit/e6350bf03deaaef992c855a0fa1fb161778e341c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569214554
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361269587
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
+            // we should remove lost executors from CoarseGrainedSchedulerBackend manually
+            // here to guarantee two things:
+            // 1) call scheduler.executorLost() underlying to fail any tasks assigned to
+            //    those executors to avoid app hang
+            // 2) always remove executor information from CoarseGrainedSchedulerBackend for
+            //    a lost executor
+            sc.schedulerBackend match {
+              case backend: CoarseGrainedSchedulerBackend =>
+                backend.driverEndpoint.send(RemoveExecutor(executorId,
+                  SlaveLost(s"Executor heartbeat timed out after ${now - lastSeenMs} ms")))
+              case _ =>
 
 Review comment:
   we should match `LocalSchedulerBackend`, do nothing but leaving a comment to say that `LocalSchedulerBackend`only has a single executor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569187969
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20634/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][CORE] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-569203034
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/115836/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568667509
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568640533
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20480/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#discussion_r361269466
 
 

 ##########
 File path: core/src/main/scala/org/apache/spark/HeartbeatReceiver.scala
 ##########
 @@ -199,14 +201,25 @@ private[spark] class HeartbeatReceiver(sc: SparkContext, clock: Clock)
       if (now - lastSeenMs > executorTimeoutMs) {
         logWarning(s"Removing executor $executorId with no recent heartbeats: " +
           s"${now - lastSeenMs} ms exceeds timeout $executorTimeoutMs ms")
-        scheduler.executorLost(executorId, SlaveLost("Executor heartbeat " +
-          s"timed out after ${now - lastSeenMs} ms"))
-          // Asynchronously kill the executor to avoid blocking the current thread
+        // Asynchronously kill the executor to avoid blocking the current thread
         killExecutorThread.submit(new Runnable {
           override def run(): Unit = Utils.tryLogNonFatalError {
             // Note: we want to get an executor back after expiring this one,
             // so do not simply call `sc.killExecutor` here (SPARK-8119)
             sc.killAndReplaceExecutor(executorId)
+            // SPARK-27348: in case of the executors which are not gracefully shut down,
+            // we should remove lost executors from CoarseGrainedSchedulerBackend manually
+            // here to guarantee two things:
+            // 1) call scheduler.executorLost() underlying to fail any tasks assigned to
 
 Review comment:
   nvm, `LocalSchedulerBackend` only has a single executor.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #26980: [SPARK-27348][Core] HeartbeatReceiver should remove lost executors from CoarseGrainedSchedulerBackend
URL: https://github.com/apache/spark/pull/26980#issuecomment-568694966
 
 
   **[Test build #115717 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/115717/testReport)** for PR 26980 at commit [`b4e29e8`](https://github.com/apache/spark/commit/b4e29e86a73be3dfb657268cee480af95ff1995a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org