You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by ericl <gi...@git.apache.org> on 2017/03/04 23:51:54 UTC

[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

GitHub user ericl opened a pull request:

    https://github.com/apache/spark/pull/17166

    [SPARK-19820] [core] Allow reason to be specified for task kill

    ## What changes were proposed in this pull request?
    
    This refactors the task kill path to allow specifying a reason for the task kill. The reason is propagated opaquely through events, and will show up in the UI automatically as `(N tasks killed: $reason)` and `TaskKilled: $reason`.
    
    Also, make the logic for whether a task failure should be retried explicit rather than special casing TaskKilled messages.
    
    cc @rxin
    
    ## How was this patch tested?
    
    Existing tests, tried killing some stages in the UI and verified the messages are as expected.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ericl/spark kill-reason

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17166
    
----
commit e9178b61f356ecf4469a58a05ee4183e7beb4bf9
Author: Eric Liang <ek...@google.com>
Date:   2017-03-04T23:47:36Z

    Allow reason to be specified for task kill

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74576/testReport)** for PR 17166 at commit [`72b28cb`](https://github.com/apache/spark/commit/72b28cb0dc7aacf7cde1b5e49f05da49cb5de276).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class KillTask(taskId: Long, executor: String, interruptThread: Boolean, reason: String)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73917/testReport)** for PR 17166 at commit [`91b8aef`](https://github.com/apache/spark/commit/91b8aeff8adca4454b9631a0bfa01876de71bb53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107272852
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -215,7 +215,8 @@ private[spark] class PythonRunner(
     
               case e: Exception if context.isInterrupted =>
                 logDebug("Exception thrown after task interruption", e)
    -            throw new TaskKilledException
    +            context.killTaskIfInterrupted()
    +            null  // not reached
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106051697
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    > What is the expectation when a task is being killed.
    > Is it specifically for the task being referenced; or all attempts of the task ?
    
    The current task attempt (which is uniquely identifier by the task id). I updated the docs as suggested here.
    
    > "killAndRescheduleTask" implies it will be rescheduled - which might not occur in case this was a speculative task (or already completed) : would be good to clarify.
    
    Went with killTaskAttempt.
    
    > Is this expected to be exposed via the UI ?
    > How is it to be leveraged (if not via UI) ?
    
    For now, you can look at the Spark UI, find the task ID, and call killTaskAttempt on it. It would be nice to have this as a button on the executor page in a follow-up. You can also have a listener that kills tasks as suggested.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74572 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74572/testReport)** for PR 17166 at commit [`31967d1`](https://github.com/apache/spark/commit/31967d185852870d8edecb855ea1aafb7bd04dd1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104772015
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    iirc killed was always used internally for killing tasks without needing retry - hence the check.
    Good point @kayousterhout , we might need to revisit use of KILLED to ensure this does not break (now with dev invocation of killed) which might need retry.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104572407
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    cc @kayousterhout does removing this check seem safe to you? It looks like the only case `taskState != TaskState.KILLED` guards against here is cancelled speculative tasks. Since those are relatively rare, it seems ok to call revive offers in those cases unconditionally. Tasks from cancelled stages and jobs should still be handled by the remaining isZombie check.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75130/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74679 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74679/testReport)** for PR 17166 at commit [`fda712d`](https://github.com/apache/spark/commit/fda712de614a4ab37d359b1c192415d1df894ab1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106555729
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,22 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill and reschedule the given task attempt. Task ids can be obtained from the Spark UI
    +   * or through SparkListener.onTaskStart.
    +   *
    +   * @param taskId the task ID to kill. This id uniquely identifies the task attempt.
    +   * @param interruptThread whether to interrupt the thread running the task.
    +   * @param reason the reason for killing the task, which should be a short string. If a task
    +   *   is killed multiple times with different reasons, only one reason will be reported.
    +   */
    +  def killTaskAttempt(
    +      taskId: Long,
    +      interruptThread: Boolean = true,
    +      reason: String = "cancelled"): Unit = {
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107552896
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -215,7 +215,7 @@ private[spark] class PythonRunner(
     
               case e: Exception if context.isInterrupted =>
                 logDebug("Exception thrown after task interruption", e)
    -            throw new TaskKilledException
    +            throw new TaskKilledException(context.getKillReason().getOrElse("unknown reason"))
    --- End diff --
    
    Hm ok if Mridul wants this then fine to leave as-is


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107543449
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,21 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    +    logInfo(s"Killing task ($reason): $taskId")
    +    val execId = taskIdToExecutorId.getOrElse(
    +      taskId, throw new IllegalArgumentException("Task not found: " + taskId))
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75069/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104593790
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -158,7 +158,8 @@ private[spark] class Executor(
         threadPool.execute(tr)
       }
     
    -  def killTask(taskId: Long, interruptThread: Boolean): Unit = {
    +  def killTask(
    --- End diff --
    
    this fits in one line?
    
    ```
      def killTask(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74999/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74059 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74059/testReport)** for PR 17166 at commit [`90d2c98`](https://github.com/apache/spark/commit/90d2c980ce36b66a1ca421f29f08c0d16818c1f7).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class TaskKilled(reason: String) extends TaskFailedReason `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74059/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74702/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75000 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75000/testReport)** for PR 17166 at commit [`6e8593b`](https://github.com/apache/spark/commit/6e8593b9bb88a2b0bf90e39887368cc4535480b6).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107274054
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    There is no need, but reviving offers has no effect either way. Those tasks will not be resubmitted even if reviveOffers() is called (in fact, reviveOffers() is called periodically on a timer thread, so if this was an issue we should have already seen it).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    I merged this to master.  I realized that the PR description is still from an old version of the change, so I modified the commit message to add that this also adds the SparkContext.killTaskAttempt method.  Thanks for all of the work here @ericl!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75064/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Hi @kayousterhout,
      Can you take over reviewing this PR ? I might be tied up with other things for next couple of weeks, and I dont want @ericl's work to be blocked on me.
    
    Thx


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104593724
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    same thing for the lower level dag scheduler api.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74572/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by JoshRosen <gi...@git.apache.org>.
Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104571549
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
    @@ -212,8 +215,8 @@ case object TaskResultLost extends TaskFailedReason {
      * Task was killed intentionally and needs to be rescheduled.
      */
     @DeveloperApi
    -case object TaskKilled extends TaskFailedReason {
    -  override def toErrorString: String = "TaskKilled (killed intentionally)"
    +case class TaskKilled(reason: String, override val shouldRetry: Boolean) extends TaskFailedReason {
    --- End diff --
    
    I think it's a little weird for `shouldRetry` to be part of this interface: whether a killed task should be retried is already handled by existing scheduling logic and it's a little confusing to be setting it on a per-task basis. Basically, it's confusing to me because this isn't the source-of-truth on whether we can/will re-try and its purpose here is therefore a bit unclear to me.
    
    As discussed offline, I think that we may be able to update the logic in `TaskSchedulerImpl.handleFailedTask` (the only place which reads this field) in order to eliminate the need for this field in this class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104595023
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -158,7 +158,8 @@ private[spark] class Executor(
         threadPool.execute(tr)
       }
     
    -  def killTask(taskId: Long, interruptThread: Boolean): Unit = {
    +  def killTask(
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/17166


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73912/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73917/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74576/testReport)** for PR 17166 at commit [`72b28cb`](https://github.com/apache/spark/commit/72b28cb0dc7aacf7cde1b5e49f05da49cb5de276).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74570 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74570/testReport)** for PR 17166 at commit [`6c90289`](https://github.com/apache/spark/commit/6c902898b7b29f5f32a1618cfbf06e39c8fc3f0f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74999 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74999/testReport)** for PR 17166 at commit [`203a900`](https://github.com/apache/spark/commit/203a90020031b71d976f60491d757c4d78b85517).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75064 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75064/testReport)** for PR 17166 at commit [`a37c09b`](https://github.com/apache/spark/commit/a37c09b78ab5362e3464e8201f1839cacef8a382).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104599319
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    --- End diff --
    
    "killed by user via SparkContext.killTask"?  These things can be hard to debug when they're unexpected and "cancelled" isn't very helpful


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74697 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74697/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74057 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74057/testReport)** for PR 17166 at commit [`02d81b5`](https://github.com/apache/spark/commit/02d81b53cb7ea8a5383c800fb98c9a90c968cb8f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107070379
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
    @@ -354,7 +354,7 @@ private[spark] object UIUtils extends Logging {
             {completed}/{total}
             { if (failed > 0) s"($failed failed)" }
             { if (skipped > 0) s"($skipped skipped)" }
    -        { if (killed > 0) s"($killed killed)" }
    +        { reasonToNumKilled.map { case (reason, count) => s"($count killed: $reason)" } }
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106051633
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    --- End diff --
    
    The only issue here is that the UI is not great at rendering long strings (it tends to cut them off). I'd prefer to keep it something concise for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74570 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74570/testReport)** for PR 17166 at commit [`6c90289`](https://github.com/apache/spark/commit/6c902898b7b29f5f32a1618cfbf06e39c8fc3f0f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Yes -- this is useful if you want to implement extensions to Spark that can kill tasks for other reasons, e.g. if a debugger detects that a task has entered a bad state. Without this change, there is no way to provide the user feedback through the UI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107235703
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
    @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason {
      * Task was killed intentionally and needs to be rescheduled.
      */
     @DeveloperApi
    -case object TaskKilled extends TaskFailedReason {
    -  override def toErrorString: String = "TaskKilled (killed intentionally)"
    +case class TaskKilled(reason: String) extends TaskFailedReason {
    +  override def toErrorString: String = s"TaskKilled ($reason)"
    --- End diff --
    
    That is unfortunate, but looks like it cant be helped if we need this feature.
    Probably something to keep in mind with future use of case objects !
    
    Thx for clarifying.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107234055
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -59,8 +59,8 @@ private[spark] class TaskContextImpl(
       /** List of callback functions to execute when the task fails. */
       @transient private val onFailureCallbacks = new ArrayBuffer[TaskFailureListener]
     
    -  // Whether the corresponding task has been killed.
    -  @volatile private var interrupted: Boolean = false
    +  // If defined, the corresponding task has been killed for the contained reason.
    +  @volatile private var maybeKillReason: Option[String] = None
    --- End diff --
    
    nit: Overloading `maybeKillReason` to indicate  `interrupted` status smells a bit; but might be ok for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107533292
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
     
       // A flag to indicate whether the task is killed. This is used in case context is not yet
       // initialized when kill() is invoked.
    -  @volatile @transient private var _killed = false
    +  @volatile @transient private var _maybeKillReason: String = null
    --- End diff --
    
    Can you update the comment here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107542655
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
     
       // A flag to indicate whether the task is killed. This is used in case context is not yet
       // initialized when kill() is invoked.
    -  @volatile @transient private var _killed = false
    +  @volatile @transient private var _maybeKillReason: String = null
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Drilling down into the detail view is kind of cumbersome -- I think it's most useful to have a good summary at the progress bar, and then the user can refer to logs for detailed per-task debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106789380
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  // Launches one task that will run forever. Once the SparkListener detects the task has
    +  // started, kill and re-schedule it. The second run of the task will complete immediately.
    +  // If this test times out, then the first version of the task wasn't killed successfully.
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    +    val listener = new SparkListener {
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        eventually(timeout(10.seconds)) {
    +          assert(SparkContextSuite.isTaskStarted)
    +        }
    +        if (!SparkContextSuite.taskKilled) {
    +          SparkContextSuite.taskKilled = true
    +          sc.killTaskAttempt(taskStart.taskInfo.taskId, true, "first attempt will hang")
    +        }
    +      }
    --- End diff --
    
    A `onTaskEnd` validating failure of the first task and launch/success of second attempt should be added here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107776906
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    What if Eric changes TSM.handleFailedTask to return a boolean value indicating whether the failed task needs to be re-scheduled?  Then we could use that to decide whether to call reviveOffers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106555639
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,22 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill and reschedule the given task attempt. Task ids can be obtained from the Spark UI
    +   * or through SparkListener.onTaskStart.
    +   *
    +   * @param taskId the task ID to kill. This id uniquely identifies the task attempt.
    +   * @param interruptThread whether to interrupt the thread running the task.
    +   * @param reason the reason for killing the task, which should be a short string. If a task
    +   *   is killed multiple times with different reasons, only one reason will be reported.
    +   */
    +  def killTaskAttempt(
    +      taskId: Long,
    +      interruptThread: Boolean = true,
    +      reason: String = "cancelled"): Unit = {
    --- End diff --
    
    As discussed how about "killed via SparkContext.killTaskAttempt" or similar? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107534312
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,21 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    +    logInfo(s"Killing task ($reason): $taskId")
    +    val execId = taskIdToExecutorId.getOrElse(
    +      taskId, throw new IllegalArgumentException("Task not found: " + taskId))
    --- End diff --
    
    Also it's kind of ugly that this throws an exception (seems like it could be an unhappy surprise to the user that their SparkContext threw an exception / died).  How about instead changing the killTaskAttempt calls to return a boolean that's True if the task was successfully killed (and the returning false here)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107528922
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -215,7 +215,7 @@ private[spark] class PythonRunner(
     
               case e: Exception if context.isInterrupted =>
                 logDebug("Exception thrown after task interruption", e)
    -            throw new TaskKilledException
    +            throw new TaskKilledException(context.getKillReason().getOrElse("unknown reason"))
    --- End diff --
    
    why do you need the getOrElse here? (since isInterrupted is true, shouldn't this always be defined?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104769358
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    I am unclear about the expectations from the api.
    - What is the expectation when a task is being killed.
      - Is it specifically for the task being referenced; or all attempts of the task ?
      - If latter, do we discard already succeeded tasks ?
      - "killAndRescheduleTask" implies it will be rescheduled - which might not occur in case this was a speculative task (or already completed) : would be good to clarify.
    - Is this expected to be exposed via the UI ?
      - How is it to be leveraged (if not via UI) ? 
    - How to get to taskId - via SparkListener.onTaskStart ?
      - Any other means ? (IIRC no other way to get at it, but good to clarify for api user)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106053125
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -239,8 +244,9 @@ private[spark] class Executor(
          */
         @volatile var task: Task[Any] = _
     
    -    def kill(interruptThread: Boolean): Unit = {
    -      logInfo(s"Executor is trying to kill $taskName (TID $taskId)")
    +    def kill(interruptThread: Boolean, reason: String): Unit = {
    +      logInfo(s"Executor is trying to kill $taskName (TID $taskId), reason: $reason")
    --- End diff --
    
    Which paren do you mean here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106063061
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -239,8 +244,9 @@ private[spark] class Executor(
          */
         @volatile var task: Task[Any] = _
     
    -    def kill(interruptThread: Boolean): Unit = {
    -      logInfo(s"Executor is trying to kill $taskName (TID $taskId)")
    +    def kill(interruptThread: Boolean, reason: String): Unit = {
    +      logInfo(s"Executor is trying to kill $taskName (TID $taskId), reason: $reason")
    --- End diff --
    
    Hm good question looks fine


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74059 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74059/testReport)** for PR 17166 at commit [`90d2c98`](https://github.com/apache/spark/commit/90d2c980ce36b66a1ca421f29f08c0d16818c1f7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75005/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107273498
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
     
       // A flag to indicate whether the task is killed. This is used in case context is not yet
       // initialized when kill() is invoked.
    -  @volatile @transient private var _killed = false
    +  @volatile @transient private var _maybeKillReason: String = null
    --- End diff --
    
    This one gets deserialized to null sometimes, so it seemed cleaner to use a bare string.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Thinking about this more, this seems like two separate changes (that should probably be separated):
    
    (1) Allowing cancellations to be injected via SparkContext.  This seems like it should have its own JIRA, and is relatively few LOC (so should be easy to decouple).  Those changes look fine and I think are good to merge as-is if you move them to a new PR.
    
    (2) Allowing reasons to be specified.  This changes the API and changes many LOC.  I'm skeptical of this change: I think this could be helpful if descriptive reasons are allowed (like the few I suggested in the comments), but if you restrict reasons to a few words so that they fit in the stage summary page, they don't seem very useful to a user.  E.g., the default message of "cancelled" when sc.killTask is used seems pretty meaningless (and will require someone to read the code to understand -- at which point it seems like they might as well look in the logs instead of getting info from the UI).  This doesn't seem useful enough to merit an API change, but maybe I'm missing something important here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107070165
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    This shouldn't be a behavior change, just a simplification of the logic to always call reviveOffers(). Whether the task is rescheduled or not is decided independently of this call -- it's just nice to call reviveOffers always so the task won't end up pending until the next round of offers.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    In that case, let us make it a more complete PR - with the proposed api changes included - so that we can evaluate the merits of the change in total.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73913/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107237325
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -215,7 +215,8 @@ private[spark] class PythonRunner(
     
               case e: Exception if context.isInterrupted =>
                 logDebug("Exception thrown after task interruption", e)
    -            throw new TaskKilledException
    +            context.killTaskIfInterrupted()
    +            null  // not reached
    --- End diff --
    
    nit: It would be good if we could directly throw the exception here - instead of relying on killTaskIfInterrupted to do the right thing (it is interrupted already according to the case check)
    Not only will it not remove the unreachable `null`, but also ensure future changes to `killTaskIfInterrupted` or interrupt reset, etc does not break this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107541348
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    I left my comment before seeings Eric's -- but agree with Eric that we should leave this as-is.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75130 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75130/testReport)** for PR 17166 at commit [`8c4381f`](https://github.com/apache/spark/commit/8c4381f7eb66d4bc6265be28a82f91ef9f3fa0c0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74679 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74679/testReport)** for PR 17166 at commit [`fda712d`](https://github.com/apache/spark/commit/fda712de614a4ab37d359b1c192415d1df894ab1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106053942
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ---
    @@ -54,6 +54,9 @@ private[spark] trait TaskScheduler {
       // Cancel a stage.
       def cancelTasks(stageId: Int, interruptThread: Boolean): Unit
     
    +  // Kill a task.
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73912/testReport)** for PR 17166 at commit [`e9178b6`](https://github.com/apache/spark/commit/e9178b61f356ecf4469a58a05ee4183e7beb4bf9).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class TaskKilled(reason: String, override val shouldRetry: Boolean) extends TaskFailedReason `
      * `  case class KillTask(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107533603
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,21 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    +    logInfo(s"Killing task ($reason): $taskId")
    --- End diff --
    
    super nit but can you make this s"Killing task $taskId ($reason)"? This is somewhat more consistent with task-level logging elsewhere


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75005 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75005/testReport)** for PR 17166 at commit [`5707715`](https://github.com/apache/spark/commit/570771555c877fc0b7a8c989e14fdaf4aa79c217).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74697/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107778618
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    Oh that sounds great to me @ericl and minimally invasive!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74940/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74993 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74993/testReport)** for PR 17166 at commit [`884a3ad`](https://github.com/apache/spark/commit/884a3ad7308e69c0ca010c344133bcce6582920d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107234445
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
     
       // A flag to indicate whether the task is killed. This is used in case context is not yet
       // initialized when kill() is invoked.
    -  @volatile @transient private var _killed = false
    +  @volatile @transient private var _maybeKillReason: String = null
    --- End diff --
    
    Any reason to make this a String and not Option[String] - like other places it is defined/used ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    If I did not miss it, there is no way for user to provide this information currently, right ?
    Or is that coming in a subsequent PR ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74751/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104600689
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -538,10 +538,37 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    +    val listener = new SparkListener {
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        eventually(timeout(10.seconds)) {
    +          assert(SparkContextSuite.isTaskStarted)
    +        }
    +        if (!SparkContextSuite.taskKilled) {
    +          SparkContextSuite.taskKilled = true
    +          sc.killTask(taskStart.taskInfo.taskId, "first attempt will hang")
    +        }
    +      }
    +    }
    +    sc.addSparkListener(listener)
    +    sc.parallelize(1 to 1).foreach { x =>
    --- End diff --
    
    Can you wrap this part in an eventually statement so that the test will fail relatively quickly (i.e., in less than the 250m overall test timeout) if the kill fails


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107271262
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -140,16 +140,22 @@ private[spark] class TaskContextImpl(
       }
     
       /** Marks the task for interruption, i.e. cancellation. */
    -  private[spark] def markInterrupted(): Unit = {
    -    interrupted = true
    +  private[spark] def markInterrupted(reason: String): Unit = {
    +    maybeKillReason = Some(reason)
    +  }
    +
    +  private[spark] override def killTaskIfInterrupted(): Unit = {
    +    if (maybeKillReason.isDefined) {
    +      throw new TaskKilledException(maybeKillReason.get)
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106060636
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala ---
    @@ -104,7 +104,8 @@ private[spark] class MesosExecutorBackend
           logError("Received KillTask but executor was null")
         } else {
           // TODO: Determine the 'interruptOnCancel' property set for the given job.
    -      executor.killTask(t.getValue.toLong, interruptThread = false)
    +      executor.killTask(
    +        t.getValue.toLong, interruptThread = false, reason = "killed intentionally")
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104600582
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -538,10 +538,37 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    --- End diff --
    
    Can you add a comment about what this test is doing?  e.g.,
    
    Launches one task that will run forever.  Once the SparkListener detects the task has started, kill and re-schedule it.  The second run of the task will complete immediately.  If this test times out, then the first version of the task wasn't killed successfully.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107533097
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
     
       // A flag to indicate whether the task is killed. This is used in case context is not yet
       // initialized when kill() is invoked.
    -  @volatile @transient private var _killed = false
    +  @volatile @transient private var _maybeKillReason: String = null
     
       protected var _executorDeserializeTime: Long = 0
       protected var _executorDeserializeCpuTime: Long = 0
     
       /**
        * Whether the task has been killed.
        */
    -  def killed: Boolean = _killed
    +  def killed: Boolean = _maybeKillReason != null
    +
    +  /**
    +   * If this task has been killed, contains the reason for the kill.
    --- End diff --
    
    As above, can you make the comment "If specified, this task has been killed and this option contains the reason." (assuming that you get rid of the killed variable)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106066177
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
           logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
             s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
             s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
    -      sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true)
    +      sched.backend.killTask(
    +        attemptInfo.taskId,
    +        attemptInfo.executorId,
    +        interruptThread = true,
    +        reason = "another attempt succeeded")
    --- End diff --
    
    I added two screenshots to the PR description. In the second scenario having a verbose reason is fine, but in the stage summary view long or many distinct reasons would overflow the progress bar.
    
    We could probably fix the css to allow slightly longer / more reasons, but even that wouldn't be great if each task had a different reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73916/testReport)** for PR 17166 at commit [`1a716aa`](https://github.com/apache/spark/commit/1a716aa31ff2e8e6f5d8e3b73362d28b944319f2).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107776242
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    @mridulm It's possible that the makeOffers() call causes a different job's tasks to be executed on the given executor.  Fundamentally, the problem is that the killed task needs to be re-scheduled on a different executor, and the only way to guarantee that the task gets offered new/different executors is to do a full reviveOffers() call (which is why the code in question exists in the first place).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104599899
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
           logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
             s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
             s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
    -      sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true)
    +      sched.backend.killTask(
    +        attemptInfo.taskId,
    +        attemptInfo.executorId,
    +        interruptThread = true,
    +        reason = "another attempt succeeded")
    --- End diff --
    
    Can you put the longer version that's in the info log here? (s"Attempt${info.attemptNumber} succeeded on ${info.host}")


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106555744
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
           logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
             s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
             s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
    -      sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true)
    +      sched.backend.killTask(
    +        attemptInfo.taskId,
    +        attemptInfo.executorId,
    +        interruptThread = true,
    +        reason = "another attempt succeeded")
    --- End diff --
    
    Ok let's leave this as-is -- seems too complicated to have a longer and shorter reason (and unlike the reason above, this one is per-task, so hard to summarize on the stage page)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104598221
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    What about calling it "killAndRescheduleTask" then?  Otherwise kill is a little misleading -- since where we use it elsewhere (to kill a stage) it implies no retry


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107561714
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -296,12 +298,13 @@ private[spark] class Executor(
     
             // If this task has been killed before we deserialized it, let's quit now. Otherwise,
             // continue executing the task.
    -        if (killed) {
    +        val killReason = reasonIfKilled
    --- End diff --
    
    Ugh in retrospect I think TaskContext should have just clearly documented that an invariant of reasonIfKilled is that, once set, it won't be un-set, and then we'd avoid all of these corner cases.  But not worth changing now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75117 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75117/testReport)** for PR 17166 at commit [`3ec3633`](https://github.com/apache/spark/commit/3ec3633c3362f8c5a5d1703f21b0569d58179201).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107559290
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -296,12 +298,13 @@ private[spark] class Executor(
     
             // If this task has been killed before we deserialized it, let's quit now. Otherwise,
             // continue executing the task.
    -        if (killed) {
    +        val killReason = reasonIfKilled
    --- End diff --
    
    If we assign to a temporary, then there is no risk of seeing concurrent mutations of the value as we access it below (though, this cannot currently happen).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104599420
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    --- End diff --
    
    Also why not make this a default argument below and then have just one method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106053726
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -732,6 +732,13 @@ class DAGScheduler(
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    +    taskScheduler.killTask(taskId, true, reason)
    +  }
    --- End diff --
    
    I added this as a param to the public API, defaulting to true. It might be nice to pull the default from the job properties, but I didn't see a clean way to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107073269
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
    @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason {
      * Task was killed intentionally and needs to be rescheduled.
      */
     @DeveloperApi
    -case object TaskKilled extends TaskFailedReason {
    -  override def toErrorString: String = "TaskKilled (killed intentionally)"
    +case class TaskKilled(reason: String) extends TaskFailedReason {
    +  override def toErrorString: String = s"TaskKilled ($reason)"
    --- End diff --
    
    This is unfortunately not backwards compatible. I've looked into this, but the issue seems to be that case objects are not equal to any case class in scala. If `TaskKilled` was a case class to start with, compatibility might have been possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107345386
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    First, the assumption that reviveOffers is inexpensive is incorrect.
    Secondly, task kill happens for a few reasons automatically - and can cause a bunch of tasks to be killed (for example, with speculative execution enabled, when one task finishes, the speculative task will be killed).
    
    This can cause a request storm of revive offers to be launched.
    
    Given this, the change is incorrect and should be reverted.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74751/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74576/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    hm it might be useful to have details, but it'd also be useful to have this in the overview page without having to drill down. iiuc, the pr already has the information in task list page, doesn't it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74065 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74065/testReport)** for PR 17166 at commit [`170fa34`](https://github.com/apache/spark/commit/170fa34cdc607285a315cbb5c6fd98461ac10fe8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class TaskKilled(reason: String) extends TaskFailedReason `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107531839
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -426,15 +427,17 @@ private[spark] class Executor(
               setTaskFinishedAndClearInterruptStatus()
               execBackend.statusUpdate(taskId, TaskState.FAILED, ser.serialize(reason))
     
    -        case _: TaskKilledException =>
    -          logInfo(s"Executor killed $taskName (TID $taskId)")
    +        case t: TaskKilledException =>
    +          logInfo(s"Executor killed $taskName (TID $taskId), reason: ${t.reason}")
               setTaskFinishedAndClearInterruptStatus()
    -          execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(TaskKilled))
    +          execBackend.statusUpdate(taskId, TaskState.KILLED, ser.serialize(TaskKilled(t.reason)))
     
             case _: InterruptedException if task.killed =>
    -          logInfo(s"Executor interrupted and killed $taskName (TID $taskId)")
    +          val killReason = task.maybeKillReason.getOrElse("unknown reason")
    --- End diff --
    
    Can you change `if task.killed` to `if task.maybeKillReason.isDefied`, and then just do .get here?  Then you could get rid of the task.killed variable and avoid the weird dependency between task.killed being set and task.maybeKillReason being defined.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74697 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74697/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by vanzin <gi...@git.apache.org>.
Github user vanzin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104565430
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala ---
    @@ -30,8 +30,20 @@ private[spark] trait SchedulerBackend {
       def reviveOffers(): Unit
       def defaultParallelism(): Int
     
    -  def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit =
    +  /**
    +   * Requests that an executor kills a running task.
    +   *
    +   * @param taskId Id of the task.
    +   * @param executorId Id of the executor the task is running on.
    +   * @param interruptThread Whether the executor should interrupt the task thread.
    +   * @param reason The reason for the task kill.
    +   * @param shouldRetry Whether the scheduler should retry the task.
    +   */
    +  def killTask(
    +      taskId: Long, executorId: String, interruptThread: Boolean, reason: String,
    --- End diff --
    
    Please follow the convention defined in the "Indentation" section at http://spark.apache.org/contributing.html for long parameter lists. This happens in a bunch of methods in this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74940 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74940/testReport)** for PR 17166 at commit [`f5069f7`](https://github.com/apache/spark/commit/f5069f7932193a1190e454bb211587e6fc84ecae).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    I realized you could also just let the task reason be the empty string (or some default reason) in the Executor code.  That involves changing fewer LOC but doesn't seem like the right long-term decision, because then it's weird that developers can throw a TaskKilledException but can't specify a reason (and it also leaves the current problem where the string reason propagates through the code in a very hard to reason about way).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104595065
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala ---
    @@ -40,7 +40,8 @@ private[spark] object CoarseGrainedClusterMessages {
       // Driver to executors
       case class LaunchTask(data: SerializableBuffer) extends CoarseGrainedClusterMessage
     
    -  case class KillTask(taskId: Long, executor: String, interruptThread: Boolean)
    +  case class KillTask(
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74751/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74941/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73917 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73917/testReport)** for PR 17166 at commit [`91b8aef`](https://github.com/apache/spark/commit/91b8aeff8adca4454b9631a0bfa01876de71bb53).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class TaskKilled(reason: String, override val shouldRetry: Boolean) extends TaskFailedReason `
      * `  case class KillTask(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74941 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74941/testReport)** for PR 17166 at commit [`884a3ad`](https://github.com/apache/spark/commit/884a3ad7308e69c0ca010c344133bcce6582920d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75117 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75117/testReport)** for PR 17166 at commit [`3ec3633`](https://github.com/apache/spark/commit/3ec3633c3362f8c5a5d1703f21b0569d58179201).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    LGTM. I'll merge once tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104598647
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -239,8 +244,9 @@ private[spark] class Executor(
          */
         @volatile var task: Task[Any] = _
     
    -    def kill(interruptThread: Boolean): Unit = {
    -      logInfo(s"Executor is trying to kill $taskName (TID $taskId)")
    +    def kill(interruptThread: Boolean, reason: String): Unit = {
    +      logInfo(s"Executor is trying to kill $taskName (TID $taskId), reason: $reason")
    --- End diff --
    
    extra paren


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104595706
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    ah ok. for some reason i read it as killTask(long) is kill without retry, and killTask(long, string) is with.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104778628
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    I looked at this more and this is fine.  In any case it's not harmful to call reviveOffers -- just may result in wasted work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75000/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74057 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74057/testReport)** for PR 17166 at commit [`02d81b5`](https://github.com/apache/spark/commit/02d81b53cb7ea8a5383c800fb98c9a90c968cb8f).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104594970
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    Well, it turns out there's not a good reason to not retry. The task will get retried anyways eventually unless the stage is cancelled.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test failure seems unrelated. jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74053/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73913/testReport)** for PR 17166 at commit [`ba7cbd0`](https://github.com/apache/spark/commit/ba7cbd09ec0602ac8c9ad59966b2b45a70354bf7).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    What is the rationale for this change ? Is it to propagate the task kill reason to UI ?
    The one line in https://github.com/apache/spark/pull/17166/files#diff-b8adb646ef90f616c34eb5c98d1ebd16R357.
    Or did I miss some other use for this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73916 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73916/testReport)** for PR 17166 at commit [`1a716aa`](https://github.com/apache/spark/commit/1a716aa31ff2e8e6f5d8e3b73362d28b944319f2).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class TaskKilled(reason: String, override val shouldRetry: Boolean) extends TaskFailedReason `
      * `  case class KillTask(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74993/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74999/testReport)** for PR 17166 at commit [`203a900`](https://github.com/apache/spark/commit/203a90020031b71d976f60491d757c4d78b85517).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73912/testReport)** for PR 17166 at commit [`e9178b6`](https://github.com/apache/spark/commit/e9178b61f356ecf4469a58a05ee4183e7beb4bf9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75064/testReport)** for PR 17166 at commit [`a37c09b`](https://github.com/apache/spark/commit/a37c09b78ab5362e3464e8201f1839cacef8a382).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107070457
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  // Launches one task that will run forever. Once the SparkListener detects the task has
    +  // started, kill and re-schedule it. The second run of the task will complete immediately.
    +  // If this test times out, then the first version of the task wasn't killed successfully.
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    +    val listener = new SparkListener {
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        eventually(timeout(10.seconds)) {
    +          assert(SparkContextSuite.isTaskStarted)
    +        }
    +        if (!SparkContextSuite.taskKilled) {
    +          SparkContextSuite.taskKilled = true
    +          sc.killTaskAttempt(taskStart.taskInfo.taskId, true, "first attempt will hang")
    +        }
    +      }
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107539016
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    Without this change, the job could hang: if just one task was left, and that task got killed, I don't think reviveOffers would ever be called.
    
    @mridulm I'm not that concerned about the extra calls to reviveOffers.  In the worse case, if every task in a job is speculated (which of course can't actually happen), this leads to 2x the number of calls to reviveOffers -- so it still doesn't change the asymptotic time complexity even in the worse case.  
    
    There are already a bunch of cases where we're pretty conservative with reviveOffers, in the sense that we call it even though we might not need to (e.g., when an executor dies, even if there aren't any tasks that need to be run; or every time there are speculative tasks available to run, even if there aren't any resources to run them on) so this change is in keeping with that pattern.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106789297
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  // Launches one task that will run forever. Once the SparkListener detects the task has
    +  // started, kill and re-schedule it. The second run of the task will complete immediately.
    +  // If this test times out, then the first version of the task wasn't killed successfully.
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    +    val listener = new SparkListener {
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        eventually(timeout(10.seconds)) {
    +          assert(SparkContextSuite.isTaskStarted)
    +        }
    +        if (!SparkContextSuite.taskKilled) {
    +          SparkContextSuite.taskKilled = true
    +          sc.killTaskAttempt(taskStart.taskInfo.taskId, true, "first attempt will hang")
    +        }
    +      }
    +    }
    +    sc.addSparkListener(listener)
    +    eventually(timeout(20.seconds)) {
    +      sc.parallelize(1 to 1).foreach { x =>
    +        // first attempt will hang
    +        if (!SparkContextSuite.isTaskStarted) {
    +          SparkContextSuite.isTaskStarted = true
    +          Thread.sleep(9999999)
    +        }
    +        // second attempt succeeds immediately
    +      }
    --- End diff --
    
    Nothing actually runs in this test currently - we need to force the RDD's computation (a count() or runJob here)
    
    Also, we need to validate that the second task actually ran (which is why the bug in the test was not detected).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Why not just just "X killed" in the stage summary?  It seems like overkill to put the reasons for all of the killings there, now that I'm seeing the screenshot, since they're already in the detail view (and the whole reason we have the detail view is to show per-task info)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74940 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74940/testReport)** for PR 17166 at commit [`f5069f7`](https://github.com/apache/spark/commit/f5069f7932193a1190e454bb211587e6fc84ecae).
     * This patch **fails MiMa tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class TaskKilledException(val reason: String = \"unknown reason\") extends RuntimeException`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106051490
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104599383
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    --- End diff --
    
    "Kill and reschedule the given task."?  (makes it slightly more obvious why this might be useful; "It will be retried" sounds almost like the retry is accidental)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107271185
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -59,8 +59,8 @@ private[spark] class TaskContextImpl(
       /** List of callback functions to execute when the task fails. */
       @transient private val onFailureCallbacks = new ArrayBuffer[TaskFailureListener]
     
    -  // Whether the corresponding task has been killed.
    -  @volatile private var interrupted: Boolean = false
    +  // If defined, the corresponding task has been killed for the contained reason.
    +  @volatile private var maybeKillReason: Option[String] = None
    --- End diff --
    
    Yeah, the reason here is to allow this to be set atomically.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107534830
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    I don't think you can get a request storm. The _only_ case this guards against is exactly what you mentioned -- a speculative task that is killed because another attempt succeeded. The number of speculative tasks is always small so this shouldn't be an issue. In comparison, we hit this code path much more often with failed tasks.
    
    If we were to add this check back, then in the task kill API we would have to add a parameter as to whether revive offers should be called. This is substantial added complexity which can be removed if we make this simplification at this site. I'll leave it to @kayousterhout to decide if this is worth it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106063119
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
           logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
             s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
             s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
    -      sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true)
    +      sched.backend.killTask(
    +        attemptInfo.taskId,
    +        attemptInfo.executorId,
    +        interruptThread = true,
    +        reason = "another attempt succeeded")
    --- End diff --
    
    Can you post a screenshot of the relevant part of the UI? Is the problem just that the HTML properties don't allow columns to wrap?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106054370
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
    @@ -64,7 +64,7 @@ private[spark] object UIData {
         var numCompletedTasks: Int = 0,
         var numSkippedTasks: Int = 0,
         var numFailedTasks: Int = 0,
    -    var numKilledTasks: Int = 0,
    +    var numKilledTasks: Map[String, Int] = Map.empty,
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106789004
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    @kayousterhout Are we making the change that killed tasks can/should be retried ?
    If yes, this is a behavior change; and TSM.handleFailedTask(), we need to do the same.
    
    This is what I mentioned w.r.t killed not resulting in task resubmission.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107239694
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    @ericl Actually that is not correct.
    Killed tasks were not candidates for resubmission on failure; and hence there is no need to revive offers when task kills are detected.
    
    If they are to be made candidates, we need to introduce this expectation explicit elsewhere also to be consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106788877
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskEndReason.scala ---
    @@ -212,8 +212,8 @@ case object TaskResultLost extends TaskFailedReason {
      * Task was killed intentionally and needs to be rescheduled.
      */
     @DeveloperApi
    -case object TaskKilled extends TaskFailedReason {
    -  override def toErrorString: String = "TaskKilled (killed intentionally)"
    +case class TaskKilled(reason: String) extends TaskFailedReason {
    +  override def toErrorString: String = s"TaskKilled ($reason)"
    --- End diff --
    
    Since this was part of DeveloperApi, what is the impact of making this change ?
    In JsonProtocol, in mocked user code ?
    
    If it does introduce backward incompatible changes, is there a way to mitigate this ?
    Perhaps make TaskKilled a class with apply/unapply/serde/toString (essentially all that case class provides) and case object with apply with default reason = null (and logged when used as deprecated) ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74993 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74993/testReport)** for PR 17166 at commit [`884a3ad`](https://github.com/apache/spark/commit/884a3ad7308e69c0ca010c344133bcce6582920d).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `class TaskKilledException(val reason: String) extends RuntimeException `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104773405
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -732,6 +732,13 @@ class DAGScheduler(
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    +    taskScheduler.killTask(taskId, true, reason)
    +  }
    --- End diff --
    
    always interruptThread ? Thread interrupt'ion has other implications, particularly on lower stack libraries.
    See https://github.com/apache/spark/pull/17113#issuecomment-284501759 for example


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Made the change to improve the default reason, which now says "killed via SparkContext.killTaskAttempt".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104593825
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -732,6 +732,13 @@ class DAGScheduler(
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    similar to the public api, we should separate retry from reason...



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75069 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75069/testReport)** for PR 17166 at commit [`71b41b3`](https://github.com/apache/spark/commit/71b41b3ea11d4d3490fdc1ac9061e501ae0f8589).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107553047
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -296,12 +298,13 @@ private[spark] class Executor(
     
             // If this task has been killed before we deserialized it, let's quit now. Otherwise,
             // continue executing the task.
    -        if (killed) {
    +        val killReason = reasonIfKilled
    --- End diff --
    
    why re-name the variable here (instead of just using reasonIfKilled below)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107542763
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala ---
    @@ -160,15 +160,20 @@ private[spark] abstract class Task[T](
     
       // A flag to indicate whether the task is killed. This is used in case context is not yet
       // initialized when kill() is invoked.
    -  @volatile @transient private var _killed = false
    +  @volatile @transient private var _maybeKillReason: String = null
     
       protected var _executorDeserializeTime: Long = 0
       protected var _executorDeserializeCpuTime: Long = 0
     
       /**
        * Whether the task has been killed.
        */
    -  def killed: Boolean = _killed
    +  def killed: Boolean = _maybeKillReason != null
    +
    +  /**
    +   * If this task has been killed, contains the reason for the kill.
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107777651
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    There might be a simple solution here to avoid extra overhead on speculative tasks. We just need to check if the task index has been marked as successful -- if so, we can skip calling reviveOffers(). How does this look?
    
    ```
    -    if (!taskSetManager.isZombie) {
    +    if (!taskSetManager.isZombie && !taskSetManager.someAttemptSucceeded(tid)) {
    ```
    
    Then in TaskSetManager,
    ```
    +  def someAttemptSucceeded(tid: Long): Boolean = {
    +    successful(taskInfos(tid).index)
    +  }
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75005 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75005/testReport)** for PR 17166 at commit [`5707715`](https://github.com/apache/spark/commit/570771555c877fc0b7a8c989e14fdaf4aa79c217).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107273296
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -302,12 +298,12 @@ private[spark] class Executor(
     
             // If this task has been killed before we deserialized it, let's quit now. Otherwise,
             // continue executing the task.
    -        if (killed) {
    +        if (maybeKillReason.isDefined) {
               // Throw an exception rather than returning, because returning within a try{} block
               // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
               // exception will be caught by the catch block, leading to an incorrect ExceptionFailure
               // for the task.
    -          throw new TaskKilledException
    +          throw new TaskKilledException(maybeKillReason.get)
    --- End diff --
    
    Fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104598293
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    I need to do some git blaming to be 100% sure this is OK...I'll take a look tomorrow


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107559342
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,26 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Boolean = {
    +    logInfo(s"Killing task $taskId: $reason")
    +    val execId = taskIdToExecutorId.get(taskId)
    +    if (execId.isDefined) {
    +      backend.killTask(taskId, execId.get, interruptThread, reason)
    +      true
    +    } else {
    +      logInfo(s"Could not kill task $taskId because no task with that ID was found.")
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Rebased


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Added `killTask(id: TaskId, reason: String)` to SparkContext and a corresponding test.
    
    cc @joshrosen for the API changes. As discussed offline, it's very hard to preserve binary compatibility here since we have to move from a case object to a case class to add a reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104593920
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedClusterMessage.scala ---
    @@ -40,7 +40,8 @@ private[spark] object CoarseGrainedClusterMessages {
       // Driver to executors
       case class LaunchTask(data: SerializableBuffer) extends CoarseGrainedClusterMessage
     
    -  case class KillTask(taskId: Long, executor: String, interruptThread: Boolean)
    +  case class KillTask(
    --- End diff --
    
    this also seems to fit in one line?
    
    ```
      case class KillTask(taskId: Long, executor: String, interruptThread: Boolean, reason: String)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107541631
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -59,8 +59,8 @@ private[spark] class TaskContextImpl(
       /** List of callback functions to execute when the task fails. */
       @transient private val onFailureCallbacks = new ArrayBuffer[TaskFailureListener]
     
    -  // Whether the corresponding task has been killed.
    -  @volatile private var interrupted: Boolean = false
    +  // If defined, the corresponding task has been killed for the contained reason.
    +  @volatile private var maybeKillReason: Option[String] = None
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104566606
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/SchedulerBackend.scala ---
    @@ -30,8 +30,20 @@ private[spark] trait SchedulerBackend {
       def reviveOffers(): Unit
       def defaultParallelism(): Int
     
    -  def killTask(taskId: Long, executorId: String, interruptThread: Boolean): Unit =
    +  /**
    +   * Requests that an executor kills a running task.
    +   *
    +   * @param taskId Id of the task.
    +   * @param executorId Id of the executor the task is running on.
    +   * @param interruptThread Whether the executor should interrupt the task thread.
    +   * @param reason The reason for the task kill.
    +   * @param shouldRetry Whether the scheduler should retry the task.
    +   */
    +  def killTask(
    +      taskId: Long, executorId: String, interruptThread: Boolean, reason: String,
    --- End diff --
    
    fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    The task killed messages should be informative, and I don't think we should sacrifice informative messages just so they can be shown concisely in the stage summary view.  I think it's much better to have an informative message that a user needs to click one link to see than it is to force the user to look in the logs to figure out what's going on (especially since behavior around tasks getting killed can be very confusing). If others feel strongly I can be convinced to add this per-task info to the summary view, but I'm not convinced of the need to sacrifice clarify for conciseness.  @mridum @rxin what do you two think here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74073 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74073/testReport)** for PR 17166 at commit [`f03de61`](https://github.com/apache/spark/commit/f03de619a5e2df41fa7d70dd574159da5b4fa4db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74057/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104593710
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    hm i don't think we should automatically retry just by providing a reason. Perhaps this
    
    ```
    def killTask(taskId: Long, reason: String): Unit
    
    def killTaskAndRetry(taskId: Long, reason: String): Unit
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107773629
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    Re: (a) Each revive offer complexity is not trivial, it is the cost of a full schedule over all available active executors (happening within the driver rpc + CoarseGrainedSchedulerBackend.this lock).
    We are now introducing potentially large number of duplicate and unnecessary (from existing jobs pov) full schedules of the order of ~25% of total number of tasks in system (spec exec enabled).
    I do not have the cluster or job(s) to give concrete numbers unfortunately - but I hope I have convinced you about the actual cost involved.
    
    
    Re (b): I was thrown off by the `info.killed == return` in `tsm.handleFailedTask`; misread it as for the current update (it is for the existing state).
    Given this, I completely mis-understood the semantics of killed and apologize for the long discussion !
    
    Having said that, I now do not believe there will be a job hang as you mentioned. 
    With speculative execution disabled, there is a `makeOffers(executorId)` in CGSB.receive when task state is finished which can cause reschedule of task on the same executor.
    If the executor itself goes away, as you mentioned earlier, it causes a reviveOffer which will cause a full schedule and submission of the killed task.
    With speculative execution enabled a periodic full schedule will cause the killed task to be picked up anyway.
    
    Given this, did I miss something which requires this to be introduced ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106052824
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -168,7 +168,8 @@ private[spark] class Executor(
                 case Some(existingReaper) => interruptThread && !existingReaper.interruptThread
               }
               if (shouldCreateReaper) {
    -            val taskReaper = new TaskReaper(taskRunner, interruptThread = interruptThread)
    +            val taskReaper = new TaskReaper(
    +              taskRunner, interruptThread = interruptThread, reason = reason)
    --- End diff --
    
    I think it's reasonable to show one (arbitrary) reason since this should be a rare situation. Also updated the killTaskAttempt doc comment to reflect this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104770757
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -168,7 +168,8 @@ private[spark] class Executor(
                 case Some(existingReaper) => interruptThread && !existingReaper.interruptThread
               }
               if (shouldCreateReaper) {
    -            val taskReaper = new TaskReaper(taskRunner, interruptThread = interruptThread)
    +            val taskReaper = new TaskReaper(
    +              taskRunner, interruptThread = interruptThread, reason = reason)
    --- End diff --
    
    showCreateReaper special cases thread interruption (and missing ofcourse) - specifying "reason" is trying to piggy back on top of this.
    What is the developer expectation when killTask is invoked with a message - should the 'reason' override the reason propagated by the TaskReaper ? If it might not, we need to document this - the reason might not make it even if kill happens after api invocation.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107539290
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    Also, I spent a while making sure that everything is ok in TSM.handleFailedTask @mridum, and all the code there seems to handle resubmission automatically (it just didn't happen previously, when we used TaskKilled for speculative tasks, because we have a check not to re-run tasks if one copy succeeded already)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107631487
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    @kayousterhout Raising a couple of points here:
    
    a) Cost of enabling this.
    For larger jobs, the cost is high. The usual jobs I used to run were 50k tasks on 750 executors (outliers being upto 200k tasks on ~4500 executors).
    At 50k tasks, roughly 12.5k tasks would be speculated - which will result in about 12.5k ReviveOffer's processing 750 executors.
    
    If I understood the examples mentioned, they will comparatively lead to much smaller number of revive offers (when executor dies, periodically for spec exec - I had it at 1s and not 100ms iirc)
    
    
    b) Expectation and preconditions for `killTaskAttempt`.
    Currently task kill is handled internally for specific purposes - when stage is killed, or when there is a successful task already completed (ignoring test cases) - hopefully I did not miss others.
    In these cases, we do not need task re-execution.
    
    What is the precondition and expectation we are exposing when user invokes `killTaskAttempt` ?          
    (Also, what usecases are we trying to enable through the api ?)
    
    I agree with your example about job hang - that will occur. 
    
    I assume that preconditions and expectations remain the same as what we currently support. 
    That is, the task is killed and not reattempted. And the kill is performed judiciously by user code.
    If this does not hold, then we might have other issues as well (for example, output commit coordinator will break).
    
    Having said this, I am not as familiar with the scheduler code anymore (want to list these out before I go MIA).
    
    
    c) @ericl we should focus on what the right contract/behavior to expose should be and focus on implementation to satisfy the design.
    See above regarding cost. While functionality trumps cost/performance, we should also be cognizant of it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74073/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/75117/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104778912
  
    --- Diff: core/src/main/scala/org/apache/spark/SparkContext.scala ---
    @@ -2250,6 +2250,25 @@ class SparkContext(config: SparkConf) extends Logging {
       }
     
       /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   */
    +  def killTask(taskId: Long): Unit = {
    +    killTask(taskId, "cancelled")
    +  }
    +
    +  /**
    +   * Kill a given task. It will be retried.
    +   *
    +   * @param taskId the task ID to kill
    +   * @param reason the reason for killing the task, which should be a short string
    +   */
    +  def killTask(taskId: Long, reason: String): Unit = {
    --- End diff --
    
    Given Mridul's points maybe killTaskAttempt is a better name?  IMO specifying "attempt" in the name makes it sound less permanent than killTask (which to me sounds like it won't be retried)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106788727
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
    @@ -354,7 +354,7 @@ private[spark] object UIUtils extends Logging {
             {completed}/{total}
             { if (failed > 0) s"($failed failed)" }
             { if (skipped > 0) s"($skipped skipped)" }
    -        { if (killed > 0) s"($killed killed)" }
    +        { reasonToNumKilled.map { case (reason, count) => s"($count killed: $reason)" } }
    --- End diff --
    
    Would be good to sort it in some order and expose it.
    Either by desc count or alphabetically : so that updates do not reorder arbitrarily.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106060313
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -538,10 +538,37 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    +    val listener = new SparkListener {
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        eventually(timeout(10.seconds)) {
    +          assert(SparkContextSuite.isTaskStarted)
    +        }
    +        if (!SparkContextSuite.taskKilled) {
    +          SparkContextSuite.taskKilled = true
    +          sc.killTask(taskStart.taskInfo.taskId, "first attempt will hang")
    +        }
    +      }
    +    }
    +    sc.addSparkListener(listener)
    +    sc.parallelize(1 to 1).foreach { x =>
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107780463
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    Great, I updated the PR to include this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74053 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74053/testReport)** for PR 17166 at commit [`a58d391`](https://github.com/apache/spark/commit/a58d391ecf444a6f575da13df6b2a2fbe32de861).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107528253
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -59,8 +59,8 @@ private[spark] class TaskContextImpl(
       /** List of callback functions to execute when the task fails. */
       @transient private val onFailureCallbacks = new ArrayBuffer[TaskFailureListener]
     
    -  // Whether the corresponding task has been killed.
    -  @volatile private var interrupted: Boolean = false
    +  // If defined, the corresponding task has been killed for the contained reason.
    +  @volatile private var maybeKillReason: Option[String] = None
    --- End diff --
    
    How about calling this `reasonIfKilled`, here and elsewhere? (if you strongly prefer the existing name find to leave as-is -- I just slightly prefer making it somewhat more obvious that this and the fact that the task has been killed are tightly intertwined).
    
    In any case, can you expand the comment a bit to one you used below: "If specified, this task has been killed and this option contains the reason."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74065/testReport)** for PR 17166 at commit [`170fa34`](https://github.com/apache/spark/commit/170fa34cdc607285a315cbb5c6fd98461ac10fe8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74702/testReport)** for PR 17166 at commit [`8f7ffb3`](https://github.com/apache/spark/commit/8f7ffb395cae9ae7aa24a14dcdb908aaee30b710).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106054145
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -710,7 +710,11 @@ private[spark] class TaskSetManager(
           logInfo(s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} " +
             s"in stage ${taskSet.id} (TID ${attemptInfo.taskId}) on ${attemptInfo.host} " +
             s"as the attempt ${info.attemptNumber} succeeded on ${info.host}")
    -      sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true)
    +      sched.backend.killTask(
    +        attemptInfo.taskId,
    +        attemptInfo.executorId,
    +        interruptThread = true,
    +        reason = "another attempt succeeded")
    --- End diff --
    
    As above, this would cause the progress bar to overflow in the UI. I think we should stick to short strings for now -- if users find this particularly useful we can add a long form reason in another PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107233146
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -302,12 +298,12 @@ private[spark] class Executor(
     
             // If this task has been killed before we deserialized it, let's quit now. Otherwise,
             // continue executing the task.
    -        if (killed) {
    +        if (maybeKillReason.isDefined) {
               // Throw an exception rather than returning, because returning within a try{} block
               // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
               // exception will be caught by the catch block, leading to an incorrect ExceptionFailure
               // for the task.
    -          throw new TaskKilledException
    +          throw new TaskKilledException(maybeKillReason.get)
    --- End diff --
    
    Same as above here - atomic use of `maybeKillReason` required.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107797887
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +479,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie && !taskSetManager.someAttemptSucceeded(tid)) {
    --- End diff --
    
    This should do it IMO, thanks !


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107070597
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -540,6 +540,39 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  // Launches one task that will run forever. Once the SparkListener detects the task has
    +  // started, kill and re-schedule it. The second run of the task will complete immediately.
    +  // If this test times out, then the first version of the task wasn't killed successfully.
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    +    val listener = new SparkListener {
    +      override def onTaskStart(taskStart: SparkListenerTaskStart): Unit = {
    +        eventually(timeout(10.seconds)) {
    +          assert(SparkContextSuite.isTaskStarted)
    +        }
    +        if (!SparkContextSuite.taskKilled) {
    +          SparkContextSuite.taskKilled = true
    +          sc.killTaskAttempt(taskStart.taskInfo.taskId, true, "first attempt will hang")
    +        }
    +      }
    +    }
    +    sc.addSparkListener(listener)
    +    eventually(timeout(20.seconds)) {
    +      sc.parallelize(1 to 1).foreach { x =>
    +        // first attempt will hang
    +        if (!SparkContextSuite.isTaskStarted) {
    +          SparkContextSuite.isTaskStarted = true
    +          Thread.sleep(9999999)
    +        }
    +        // second attempt succeeds immediately
    +      }
    --- End diff --
    
    `foreach` is an action so actually it does run, but I added the verification just in case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107542020
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -215,7 +215,7 @@ private[spark] class PythonRunner(
     
               case e: Exception if context.isInterrupted =>
                 logDebug("Exception thrown after task interruption", e)
    -            throw new TaskKilledException
    +            throw new TaskKilledException(context.getKillReason().getOrElse("unknown reason"))
    --- End diff --
    
    @mridulm pointed out that should the kill reason get reset to None by a concurrent thread, this would crash. However, it is true that this can't happen in the current implementation.
    
    If you think it's clearer, we could throw an AssertionError in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104598515
  
    --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/executor/MesosExecutorBackend.scala ---
    @@ -104,7 +104,8 @@ private[spark] class MesosExecutorBackend
           logError("Received KillTask but executor was null")
         } else {
           // TODO: Determine the 'interruptOnCancel' property set for the given job.
    -      executor.killTask(t.getValue.toLong, interruptThread = false)
    +      executor.killTask(
    +        t.getValue.toLong, interruptThread = false, reason = "killed intentionally")
    --- End diff --
    
    can you make this say "killed by Mesos" instead? (assuming this is correct?) "intentionally" doesn't provide much value since it's always intentional


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107797382
  
    --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala ---
    @@ -215,7 +215,7 @@ private[spark] class PythonRunner(
     
               case e: Exception if context.isInterrupted =>
                 logDebug("Exception thrown after task interruption", e)
    -            throw new TaskKilledException
    +            throw new TaskKilledException(context.getKillReason().getOrElse("unknown reason"))
    --- End diff --
    
    @kayousterhout I actually had not considered this, but the use of maybeKillReason in Executor/other places; this was a nice catch by @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74053 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74053/testReport)** for PR 17166 at commit [`a58d391`](https://github.com/apache/spark/commit/a58d391ecf444a6f575da13df6b2a2fbe32de861).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74679/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r106060305
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -538,10 +538,37 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
         }
       }
     
    +  test("Killing tasks") {
    +    sc = new SparkContext(new SparkConf().setAppName("test").setMaster("local"))
    +
    +    SparkContextSuite.isTaskStarted = false
    +    SparkContextSuite.taskKilled = false
    +
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75130 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75130/testReport)** for PR 17166 at commit [`8c4381f`](https://github.com/apache/spark/commit/8c4381f7eb66d4bc6265be28a82f91ef9f3fa0c0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #73913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/73913/testReport)** for PR 17166 at commit [`ba7cbd0`](https://github.com/apache/spark/commit/ba7cbd09ec0602ac8c9ad59966b2b45a70354bf7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #74572 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74572/testReport)** for PR 17166 at commit [`31967d1`](https://github.com/apache/spark/commit/31967d185852870d8edecb855ea1aafb7bd04dd1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `  case class KillTask(taskId: Long, executor: String, interruptThread: Boolean, reason: String)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107232894
  
    --- Diff: core/src/main/scala/org/apache/spark/TaskContextImpl.scala ---
    @@ -140,16 +140,22 @@ private[spark] class TaskContextImpl(
       }
     
       /** Marks the task for interruption, i.e. cancellation. */
    -  private[spark] def markInterrupted(): Unit = {
    -    interrupted = true
    +  private[spark] def markInterrupted(reason: String): Unit = {
    +    maybeKillReason = Some(reason)
    +  }
    +
    +  private[spark] override def killTaskIfInterrupted(): Unit = {
    +    if (maybeKillReason.isDefined) {
    +      throw new TaskKilledException(maybeKillReason.get)
    --- End diff --
    
    This is not thread safe - while technically we do not allow kill reason to be reset to None right now and might be fine, it can lead to future issues.
    
    Either make all access/updates to kill reason synchronized; or capture `maybeKillReason` to a local variable and use that in the `if` and `throw`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107235395
  
    --- Diff: core/src/test/scala/org/apache/spark/SparkContextSuite.scala ---
    @@ -569,8 +575,10 @@ class SparkContextSuite extends SparkFunSuite with LocalSparkContext with Eventu
               Thread.sleep(9999999)
             }
             // second attempt succeeds immediately
    +        SparkContextSuite.taskSucceeded = true
           }
         }
    +    assert(SparkContextSuite.taskSucceeded)
    --- End diff --
    
    Both listener and the task are both setting taskSuceeded ? That does not look right ...
    I am assuming we need one failure to be raised with the appropriate message, one task success - to ensure listener success.
    Additionally, re-execution of task to indicate success of task (though this aspect should be covered in some other test already).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/73916/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74570/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107553383
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,26 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Boolean = {
    +    logInfo(s"Killing task $taskId: $reason")
    +    val execId = taskIdToExecutorId.get(taskId)
    +    if (execId.isDefined) {
    +      backend.killTask(taskId, execId.get, interruptThread, reason)
    +      true
    +    } else {
    +      logInfo(s"Could not kill task $taskId because no task with that ID was found.")
    --- End diff --
    
    logWarn?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107533896
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,21 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    +    logInfo(s"Killing task ($reason): $taskId")
    +    val execId = taskIdToExecutorId.getOrElse(
    +      taskId, throw new IllegalArgumentException("Task not found: " + taskId))
    --- End diff --
    
    similarly how about s"Cannot kill task $taskId because it no task with that ID was found."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104599195
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskScheduler.scala ---
    @@ -54,6 +54,9 @@ private[spark] trait TaskScheduler {
       // Cancel a stage.
       def cancelTasks(stageId: Int, interruptThread: Boolean): Unit
     
    +  // Kill a task.
    --- End diff --
    
    A comment isn't very helpful if it just adds "a" to the method name :).  Can you remove this?  Also as above "killAndReschedule" would be a better name here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    LGTM -- this looks great.  Thanks for coming up with a simple way to address @mridulm's feedback Eric!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r104600104
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/jobs/UIData.scala ---
    @@ -64,7 +64,7 @@ private[spark] object UIData {
         var numCompletedTasks: Int = 0,
         var numSkippedTasks: Int = 0,
         var numFailedTasks: Int = 0,
    -    var numKilledTasks: Int = 0,
    +    var numKilledTasks: Map[String, Int] = Map.empty,
    --- End diff --
    
    reasonToNumKilledTasks? (here and the other two places)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75069 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75069/testReport)** for PR 17166 at commit [`71b41b3`](https://github.com/apache/spark/commit/71b41b3ea11d4d3490fdc1ac9061e501ae0f8589).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74065/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74702/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107542944
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -239,14 +239,21 @@ private[spark] class TaskSchedulerImpl private[scheduler](
             //    simply abort the stage.
             tsm.runningTasksSet.foreach { tid =>
               val execId = taskIdToExecutorId(tid)
    -          backend.killTask(tid, execId, interruptThread)
    +          backend.killTask(tid, execId, interruptThread, reason = "stage cancelled")
             }
             tsm.abort("Stage %s cancelled".format(stageId))
             logInfo("Stage %d was cancelled".format(stageId))
           }
         }
       }
     
    +  override def killTaskAttempt(taskId: Long, interruptThread: Boolean, reason: String): Unit = {
    +    logInfo(s"Killing task ($reason): $taskId")
    --- End diff --
    
    Done


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17166: [SPARK-19820] [core] Allow reason to be specified...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17166#discussion_r107740806
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -467,7 +474,7 @@ private[spark] class TaskSchedulerImpl private[scheduler](
           taskState: TaskState,
           reason: TaskFailedReason): Unit = synchronized {
         taskSetManager.handleFailedTask(tid, taskState, reason)
    -    if (!taskSetManager.isZombie && taskState != TaskState.KILLED) {
    +    if (!taskSetManager.isZombie) {
    --- End diff --
    
    @mridulm for (a), what evidence do you have that reviveOffers is costly and the bottleneck in scheduling a large job?  I agree that we're adding many reviveOffers calls in the case of large jobs -- but for the other jobs I've benchmarked in the past, I haven't seen this be a bottleneck (and when all of a job's tasks have started running, reviveOffers should be very fast).
    
    Re: (b), I did an extensive review of the associated scheduler code and tasks *are* re-attempted (as was verified by Eric's test).  As I mentioned above, the only reason they're not re-attempted in the current uses of TaskKilled is because either the stage has been killed (so the task set is marked as a zombie) or because the task attempt has already succeeded elsewhere (as for speculative tasks).  Also the Mesos scheduler code uses TaskKilled in the same way as this PR (where Mesos may kill a task that should be re-scheduled elsewhere).  If you don't think that killed tasks that haven't succeeded elsewhere will be re-run, can you point to the specific code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    **[Test build #75000 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75000/testReport)** for PR 17166 at commit [`6e8593b`](https://github.com/apache/spark/commit/6e8593b9bb88a2b0bf90e39887368cc4535480b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17166: [SPARK-19820] [core] Allow reason to be specified for ta...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on the issue:

    https://github.com/apache/spark/pull/17166
  
    That's right, its not here. This PR only adds the distinction between tasks
    killed due to stage cancellation and speculation attempts.
    
    On Sun, Mar 5, 2017, 3:04 AM Mridul Muralidharan <no...@github.com>
    wrote:
    
    > If I did not miss it, there is no way for user to provide this information
    > currently, right ?
    > Or is that coming in a subsequent PR ?
    >
    > \u2014
    > You are receiving this because you authored the thread.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/17166#issuecomment-284220542>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/AAA6SmvlOdCiMSUezJt8WexHi5Xzor8Oks5ripakgaJpZM4MTQUz>
    > .
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org