You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by devaraj-kavali <gi...@git.apache.org> on 2016/03/28 11:20:16 UTC

[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

GitHub user devaraj-kavali opened a pull request:

    https://github.com/apache/spark/pull/11996

    [SPARK-10530] [CORE] Kill other task attempts when one taskattempt belonging the same task is succeeded in speculation

    ## What changes were proposed in this pull request?
    
    With this patch, TaskSetManager kills other running attempts when any one of the attempt succeeds for the same task. Also killed tasks will not be considered as failed tasks and they get listed separately in the UI and also shows the task state as KILLED instead of FAILED.
    
    
    ## How was this patch tested?
    
    core\src\test\scala\org\apache\spark\ui\jobs\JobProgressListenerSuite.scala
    core\src\test\scala\org\apache\spark\util\JsonProtocolSuite.scala
    
    
    I have verified this patch manually by enabling spark.speculation as true, when any attempt gets succeeded then other running attempts are getting killed for the same task and other pending tasks are getting assigned in those. And also when any attempt gets killed then they are considered as KILLED tasks and not considered as FAILED tasks. Please find the attached screen shots for the reference.
    
    ![stage-tasks-table](https://cloud.githubusercontent.com/assets/3174804/14075132/394c6a12-f4f4-11e5-8638-20ff7b8cc9bc.png)
    ![stages-table](https://cloud.githubusercontent.com/assets/3174804/14075134/3b60f412-f4f4-11e5-9ea6-dd0dcc86eb03.png)
    
    
    Ref : https://github.com/apache/spark/pull/11916

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/devaraj-kavali/spark SPARK-10530

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11996.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11996
    
----
commit 1a9e36516e9016f43a605abce0ee49e1262363a6
Author: Devaraj K <de...@apache.org>
Date:   2016-03-28T09:03:07Z

    [SPARK-10530] [CORE] Kill other task attempts when one taskattempt
    belonging the same task is succeeded in speculation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63446802
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
         // Note: "result.value()" only deserializes the value when it's called at the first time, so
         // here "result.value()" just returns the value and won't block other threads.
         sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
    +    // Kill other task attempts if any as the one attempt succeeded
    --- End diff --
    
    Can you change this to "Kill any other attempts for the same task (since those are unnecessary now that one attempt completed successfully)."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63738214
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    --- End diff --
    
    I feel adding an argument to **checkSpeculatableTasks()** would lead to change the signature of the method in the Schedulable interface and correspondingly all of its implementations. I am thinking to move the code in **TaskSetManager.checkSpeculatableTasks()** to another method which takes an argument(i.e minTimeToSpeculation: Int) and same method can be used in the test. Please give your opinion on this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r58110175
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
         // Note: "result.value()" only deserializes the value when it's called at the first time, so
         // here "result.value()" just returns the value and won't block other threads.
         sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
    +    // Kill other task attempts if any as the one attempt succeeded
    +    for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber
    --- End diff --
    
    it would be nice if we could add unit test for this in TaskSetManagerSuite


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-220082266
  
    **[Test build #58791 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58791/consoleFull)** for PR 11996 at commit [`4aa7e83`](https://github.com/apache/spark/commit/4aa7e83bd375b04e550fcb4cb18a8bcfc8e78e17).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-202407567
  
    **[Test build #54314 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54314/consoleFull)** for PR 11996 at commit [`1a9e365`](https://github.com/apache/spark/commit/1a9e36516e9016f43a605abce0ee49e1262363a6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222010203
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r58569079
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    --- End diff --
    
    ah you are right, sorry looked at the wrong config.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-205755555
  
    **[Test build #54979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54979/consoleFull)** for PR 11996 at commit [`ba9ffab`](https://github.com/apache/spark/commit/ba9ffab65f9f003af3a27671b8610525c2e38d84).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64813822
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSchedulerImpl.scala ---
    @@ -64,6 +64,8 @@ private[spark] class TaskSchedulerImpl(
       // How often to check for speculative tasks
       val SPECULATION_INTERVAL_MS = conf.getTimeAsMs("spark.speculation.interval", "100ms")
     
    +  val MIN_TIME_TO_SPECULATION = 100
    --- End diff --
    
    Can you add a comment here? Something like "Duplicate copies of a task will only be launched if the original copy has been running for at least this amount of time. This is to avoid the overhead of launching speculative copies of tasks that are very short."


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63449411
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    +    val speculation = manager.checkSpeculatableTasks
    +    assert(speculation === true)
    +    // Offer resource to start the speculative attempt for the running task
    +    val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF)
    +    assert(taskOption5.isDefined)
    +    val task5 = taskOption5.get
    +    assert(task5.taskId === 4)
    +    assert(task5.executorId === "exec1")
    +    assert(task5.attemptNumber === 1)
    +    sched.backend = mock(classOf[SchedulerBackend])
    +    // Complete the speculative attempt for the running task
    +    manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3)))
    +    assert(sched.endedTasks(3) === Success)
    --- End diff --
    
    Why isn't this Killed? My understanding is that the task set finishing should trigger this call: https://github.com/devaraj-kavali/spark/blob/ba9ffab65f9f003af3a27671b8610525c2e38d84/core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala#L733 which will cause the FakeDagScheduler to set sched.endedTasks(3) to Killed.  Or does that happen *before* the successful attempt happens, so the successful one overrides it? (in any case, can you add a comment describing this?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219698137
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221970385
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222009995
  
    **[Test build #59412 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59412/consoleFull)** for PR 11996 at commit [`b05908c`](https://github.com/apache/spark/commit/b05908c2dfcee00cb732a72bb6ab1000a5cf5cd0).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r58604142
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
         // Note: "result.value()" only deserializes the value when it's called at the first time, so
         // here "result.value()" just returns the value and won't block other threads.
         sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
    +    // Kill other task attempts if any as the one attempt succeeded
    +    for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber
    +        && attemptInfo.running) {
    +      logInfo("Killing attempt " + attemptInfo.attemptNumber + " for task " + attemptInfo.id +
    +        " in stage " + taskSet.id + " (TID " + attemptInfo.taskId + ") on " + attemptInfo.host +
    +        " as the attempt " + info.attemptNumber + " succeeded on " + info.host)
    +      sched.backend.killTask(attemptInfo.taskId, attemptInfo.executorId, true)
    --- End diff --
    
    I think it would be better to have a killTask call in the taskScheduler (similar to cancelTask) rather then reaching in and getting the backend directly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-205789547
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54979/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63447507
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala ---
    @@ -71,11 +73,16 @@ class TaskInfo(
         failed = true
       }
     
    +  private[spark] def markKilled(time: Long = System.currentTimeMillis) {
    --- End diff --
    
    Can you consolidate this and the two methods above into a single markFinished method that accepts a TaskState and a time? And then that method can handle matching the TaskState to the appropriate value of killed / failed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222556829
  
    I merged this into master (so it will be in 2.1); thanks for your work on this @devaraj-kavali!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221981351
  
    Looks like there are build errors, where you need to update makeProgressBar in streaming/src/main/scala/org/apache/spark/streaming/ui/AllBatchesTable.scala:95 and streaming/src/main/scala/org/apache/spark/streaming/ui/BatchPage.scala:144


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64655722
  
    --- Diff: core/src/test/scala/org/apache/spark/ui/jobs/JobProgressListenerSuite.scala ---
    @@ -254,6 +253,11 @@ class JobProgressListenerSuite extends SparkFunSuite with LocalSparkContext with
           assert(listener.stageIdToData((task.stageId, 0)).numFailedTasks === failCount)
         }
     
    +    // Check the killed tasks count.
    --- End diff --
    
    change to "Make sure killed tasks are accounted for correctly"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-220114675
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58791/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-202407101
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221968760
  
    **[Test build #59404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59404/consoleFull)** for PR 11996 at commit [`db93d17`](https://github.com/apache/spark/commit/db93d1766f39d4f2d9fe9be0d1ce9edd68110f76).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63447725
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
    @@ -337,6 +337,16 @@ private[spark] object UIUtils extends Logging {
           failed: Int,
           skipped: Int,
           total: Int): Seq[Node] = {
    +    makeProgressBar(started, completed, failed, skipped, 0, total)
    --- End diff --
    
    can you use a named param here ("killed = 0")?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by andrewor14 <gi...@git.apache.org>.
Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-206019800
  
    For scheduler changes let's ping @kayousterhout and @markhamstra


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63448866
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    --- End diff --
    
    What about adding an argument to checkSpeculatableTasks to control the magic-100 value, so something like
    
    checkSpeculatableTasks(minTimeToSpeculation: Int  = 100) and then we can set it to 0 for tests? In general it's nice to avoid adding sleeps to the (already slow) tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222103762
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59477/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222082443
  
    **[Test build #59477 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59477/consoleFull)** for PR 11996 at commit [`8767e4c`](https://github.com/apache/spark/commit/8767e4cd7a764a4aec080fdbf7669cb1f8bfd195).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222104830
  
    @kayousterhout, I have added inline comments and the build is also fine now, please have a look into it. Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-202318667
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63446363
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
         // Note: "result.value()" only deserializes the value when it's called at the first time, so
         // here "result.value()" just returns the value and won't block other threads.
         sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
    +    // Kill other task attempts if any as the one attempt succeeded
    +    for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber
    +        && attemptInfo.running) {
    +      logInfo("Killing attempt " + attemptInfo.attemptNumber + " for task " + attemptInfo.id +
    --- End diff --
    
    Can you use string interpolation to make this more concise / readable? (e.g., s"Killing attempt ${attemptInfo.attemptNumber} for task ${attemptInfo.id} in stage " ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219716906
  
    **[Test build #58693 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58693/consoleFull)** for PR 11996 at commit [`5f2eee8`](https://github.com/apache/spark/commit/5f2eee86228aea6a397b41f5df5644f13cf788c1).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64655524
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    --- End diff --
    
    This does seem a little trickier than I'd anticipated.  I think the best thing to do is (1) change the Schedulable class to take a minTimeToSpeculation required argument.  This looks pretty simple to do -- you just need to change two implementations, and this is a private spark class, so we're not changing a public or developer API. (2) add a constant  MIN_TIME_TO_SPECULATION in the TaskSchedulerImpl object, and pass that value in when TaskSchedulerImpl calls checkSpeculatableTasks.
    
    I think overtime, we should have more tests to verify the speculation behavior, so this relatively small change to make this code path more testable seems worthwhile. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-202470953
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54314/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-205789542
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222103760
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-202470945
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r58563479
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    --- End diff --
    
    Thanks @tgravescs for your quick response.
    
    Here Thread.sleep(100) is to match the threshold value mentioned in TaskSetManager.checkSpeculatableTasks(). It is the minimum time where the task needs to run for this much of time before becoming eligible for launching a speculative attempt. I don't see any way to change this default value.
    
    > val medianDuration = durations(min((0.5 * tasksSuccessful).round.toInt, durations.length - 1))
    > val threshold = max(SPECULATION_MULTIPLIER * medianDuration, 100)
    > 
    
    I don't think this threshold value is related to the config ‘spark.speculation.interval’ here.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-202470303
  
    **[Test build #54314 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54314/consoleFull)** for PR 11996 at commit [`1a9e365`](https://github.com/apache/spark/commit/1a9e36516e9016f43a605abce0ee49e1262363a6).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219697407
  
    **[Test build #58689 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58689/consoleFull)** for PR 11996 at commit [`08f636b`](https://github.com/apache/spark/commit/08f636b2a8ba74f9415aa200bb319899a84f6cf8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64656455
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,52 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = taskSet.tasks.map { task =>
    +      task.metrics.internalAccums
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    +    assert(manager.checkSpeculatableTasks)
    +    // Offer resource to start the speculative attempt for the running task
    +    val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF)
    +    assert(taskOption5.isDefined)
    +    val task5 = taskOption5.get
    +    assert(task5.taskId === 4)
    +    assert(task5.executorId === "exec1")
    +    assert(task5.attemptNumber === 1)
    --- End diff --
    
    Can you also check that task5.index === 3 (maybe put this first)?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219698135
  
    **[Test build #58689 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58689/consoleFull)** for PR 11996 at commit [`08f636b`](https://github.com/apache/spark/commit/08f636b2a8ba74f9415aa200bb319899a84f6cf8).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63736986
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    +    val speculation = manager.checkSpeculatableTasks
    +    assert(speculation === true)
    +    // Offer resource to start the speculative attempt for the running task
    +    val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF)
    +    assert(taskOption5.isDefined)
    +    val task5 = taskOption5.get
    +    assert(task5.taskId === 4)
    +    assert(task5.executorId === "exec1")
    +    assert(task5.attemptNumber === 1)
    +    sched.backend = mock(classOf[SchedulerBackend])
    +    // Complete the speculative attempt for the running task
    +    manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3)))
    +    assert(sched.endedTasks(3) === Success)
    --- End diff --
    
    Here **sched.backend** is **mock(classOf[SchedulerBackend])** and as part of **manager.handleSuccessfulTask()**, it issues **sched.backend.killTask()** for any other attempts. Since it is a mock invocation it only ensures that other attempts kill invocation is happening. I have added the same in the comment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64653035
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala ---
    @@ -20,6 +20,8 @@ package org.apache.spark.scheduler
     import scala.collection.mutable.ListBuffer
     
     import org.apache.spark.annotation.DeveloperApi
    +import org.apache.spark.TaskState
    --- End diff --
    
    alphabetization (capital letters should come before lowercase -- so this and the import below it should both come above org.apache.spark.annotation.DeveloperApi)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/11996


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219698138
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58689/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219752590
  
    **[Test build #58693 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58693/consoleFull)** for PR 11996 at commit [`5f2eee8`](https://github.com/apache/spark/commit/5f2eee86228aea6a397b41f5df5644f13cf788c1).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221968586
  
    @kayousterhout Thanks a lot for your review and comments. I have fixed them, please have a look into this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63448080
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskSetManager.scala ---
    @@ -620,6 +620,14 @@ private[spark] class TaskSetManager(
         // Note: "result.value()" only deserializes the value when it's called at the first time, so
         // here "result.value()" just returns the value and won't block other threads.
         sched.dagScheduler.taskEnded(tasks(index), Success, result.value(), result.accumUpdates, info)
    +    // Kill other task attempts if any as the one attempt succeeded
    +    for (attemptInfo <- taskAttempts(index) if attemptInfo.attemptNumber != info.attemptNumber
    --- End diff --
    
    I think you don't need the middle condition here (if attemptInfo.attemptNumber != info.attemptNumber) since this attempt will no longer be running (since markSuccessful() was called above), so the last condition will fail?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11996: [SPARK-10530] [CORE] Kill other task attempts when one t...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the issue:

    https://github.com/apache/spark/pull/11996
  
    @lw-lin I think it will release the resources and then it throws TaskKilledException at [Executor.scala#L307](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L307). If you are facing the issue then please file a separate ticket with the details, we can discuss there.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-205967941
  
    I've been looking into this and running some tests but at this point I haven't actually had the kill triggered.  I always end up with things getting Marked failed as they finish soon after the original.  Trying some more and looking into a few other things.
    
    Also cc @andrewor14 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-205788878
  
    **[Test build #54979 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54979/consoleFull)** for PR 11996 at commit [`ba9ffab`](https://github.com/apache/spark/commit/ba9ffab65f9f003af3a27671b8610525c2e38d84).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222010206
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59412/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221531103
  
    @kayousterhout, can you have look into this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222103539
  
    **[Test build #59477 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59477/consoleFull)** for PR 11996 at commit [`8767e4c`](https://github.com/apache/spark/commit/8767e4cd7a764a4aec080fdbf7669cb1f8bfd195).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r63448950
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    +    val speculation = manager.checkSpeculatableTasks
    +    assert(speculation === true)
    --- End diff --
    
    here just do "assert(manager.checkSpeculatableTasks)"? (since the === isn't helpful for boolean values, where it's obvious what the expected / actual were)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64658256
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,52 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulatorV2[_, _]]] = taskSet.tasks.map { task =>
    +      task.metrics.internalAccums
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    +    assert(manager.checkSpeculatableTasks)
    +    // Offer resource to start the speculative attempt for the running task
    +    val taskOption5 = manager.resourceOffer("exec1", "host1", NO_PREF)
    +    assert(taskOption5.isDefined)
    +    val task5 = taskOption5.get
    +    assert(task5.taskId === 4)
    +    assert(task5.executorId === "exec1")
    +    assert(task5.attemptNumber === 1)
    +    sched.backend = mock(classOf[SchedulerBackend])
    +    // Complete the speculative attempt for the running task
    +    manager.handleSuccessfulTask(4, createTaskResult(3, accumUpdatesByTask(3)))
    +    // It ends the task with success status as part of manager.handleSuccessfulTask() and
    --- End diff --
    
    I see what's going on here now -- can you change this comment to something like "Because the SchedulerBackend was a mock, the 2nd copy of the task won't actually be killed, so the FakeTaskScheduler is only told about the successful completion of the speculated task."  I think it would be also helpful to move the verify(...) call to be before this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219752958
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-220082195
  
    Thanks a lot @kayousterhout for the review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221716392
  
    A few last comments to aid readability of this code. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221970378
  
    **[Test build #59404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59404/consoleFull)** for PR 11996 at commit [`db93d17`](https://github.com/apache/spark/commit/db93d1766f39d4f2d9fe9be0d1ce9edd68110f76).
     * This patch **fails to build**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64654392
  
    --- Diff: core/src/main/scala/org/apache/spark/ui/UIUtils.scala ---
    @@ -338,6 +338,16 @@ private[spark] object UIUtils extends Logging {
           failed: Int,
           skipped: Int,
           total: Int): Seq[Node] = {
    +    makeProgressBar(started, completed, failed, skipped, killed = 0, total)
    +  }
    +
    +  def makeProgressBar(
    --- End diff --
    
    Can you change the main makeProgressBar method to always accept a killed argument (and then update the 2-3 other locations where this is called to pass in killed = 0)? It looks like the other places that use this method (e.g., the streaming BatchPage UI class) have access to the correct number of killed tasks, so should be setting it accordingly (and it's a little cumbersome to have to maintain both versions of this method).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221970389
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/59404/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-219752963
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/58693/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-222589679
  
    Thanks @kayousterhout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-220114358
  
    **[Test build #58791 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/58791/consoleFull)** for PR 11996 at commit [`4aa7e83`](https://github.com/apache/spark/commit/4aa7e83bd375b04e550fcb4cb18a8bcfc8e78e17).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64653419
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/TaskInfo.scala ---
    @@ -58,24 +60,26 @@ class TaskInfo(
     
       var failed = false
     
    +  var killed = false
    +
       private[spark] def markGettingResult(time: Long = System.currentTimeMillis) {
         gettingResultTime = time
       }
     
    -  private[spark] def markSuccessful(time: Long = System.currentTimeMillis) {
    +  private[spark] def markFinished(time: Long = System.currentTimeMillis, state: TaskState) {
    --- End diff --
    
    Can you move state to be first? We usually put default arguments last, and then you don't need to always name the "state" argument when calling this method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-203061970
  
    sorry haven't had time to look at this yet, hopefully in a day or so


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r58542903
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -789,6 +791,51 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         assert(TaskLocation("executor_host1_3") === ExecutorCacheTaskLocation("host1", "3"))
       }
     
    +  test("Kill other task attempts when one attempt belonging to the same task succeeds") {
    +    sc = new SparkContext("local", "test")
    +    val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
    +    val taskSet = FakeTask.createTaskSet(4)
    +    val manager = new TaskSetManager(sched, taskSet, MAX_TASK_FAILURES)
    +    val accumUpdatesByTask: Array[Seq[AccumulableInfo]] = taskSet.tasks.map { task =>
    +      task.initialAccumulators.map { a => a.toInfo(Some(0L), None) }
    +    }
    +    // Offer resources for 4 tasks to start
    +    for ((k, v) <- List(
    +        "exec1" -> "host1",
    +        "exec1" -> "host1",
    +        "exec2" -> "host2",
    +        "exec2" -> "host2")) {
    +      val taskOption = manager.resourceOffer(k, v, NO_PREF)
    +      assert(taskOption.isDefined)
    +      val task = taskOption.get
    +      assert(task.executorId === k)
    +    }
    +    assert(sched.startedTasks.toSet === Set(0, 1, 2, 3))
    +    // Complete the 3 tasks and leave 1 task in running
    +    for (id <- Set(0, 1, 2)) {
    +      manager.handleSuccessfulTask(id, createTaskResult(id, accumUpdatesByTask(id)))
    +      assert(sched.endedTasks(id) === Success)
    +    }
    +
    +    // Wait for the threshold time to start speculative attempt for the running task
    +    Thread.sleep(100)
    --- End diff --
    
    it would be better to set spark.speculation.interval to known small value incase the default changes. we might as well make it smaller too so the test takes less time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on a diff in the pull request:

    https://github.com/apache/spark/pull/11996#discussion_r64814037
  
    --- Diff: core/src/test/scala/org/apache/spark/scheduler/TaskSetManagerSuite.scala ---
    @@ -795,6 +795,7 @@ class TaskSetManagerSuite extends SparkFunSuite with LocalSparkContext with Logg
         sc = new SparkContext("local", "test")
         val sched = new FakeTaskScheduler(sc, ("exec1", "host1"), ("exec2", "host2"))
         val taskSet = FakeTask.createTaskSet(4)
    +    sc.conf.set("spark.speculation.multiplier", "0.0")
    --- End diff --
    
    Can you add a comment here ("Set the speculation multiplier to be 0 so speculative tasks are launched immediately")?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #11996: [SPARK-10530] [CORE] Kill other task attempts when one t...

Posted by lw-lin <gi...@git.apache.org>.
Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/11996
  
    @devaraj-kavali @kayousterhout this is good to have, but I just wonder if this would cause resources to leak? E.g when the task is in the middle of releasing resources in a `finally` block -- like [Executor.scala#L281](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/executor/Executor.scala#L281) -- then it gets killed and interrupted?
    
    Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-211963096
  
    ping @kayousterhout and @markhamstra


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-220114670
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-221982283
  
    **[Test build #59412 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/59412/consoleFull)** for PR 11996 at commit [`b05908c`](https://github.com/apache/spark/commit/b05908c2dfcee00cb732a72bb6ab1000a5cf5cd0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-204313542
  
    Thanks @tgravescs for checking this, I will add test for these changes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-10530] [CORE] Kill other task attempts ...

Posted by devaraj-kavali <gi...@git.apache.org>.
Github user devaraj-kavali commented on the pull request:

    https://github.com/apache/spark/pull/11996#issuecomment-218671842
  
    @kayousterhout, @markhamstra any comments plz?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org