You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sitalkedia <gi...@git.apache.org> on 2016/04/16 01:52:58 UTC

[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/12436

    [SPARK-14649][CORE] DagScheduler should not run duplicate tasks on fe…

    ## What changes were proposed in this pull request?
    
    Currently in case of fetch failure, the dag scheduler reruns all the pending tasks for the failed phase even if some tasks are already running. This creates a situation where many duplicate tasks are running on the cluster.
    
    ## How was this patch tested?
    
    Added a new test case for it and made sure the test case failed without the change.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark avoid_duplicate_tasks

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12436.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12436
    
----
commit 3c77e69345b3ef82b6d4a07e202a836ec75c153e
Author: Sital Kedia <sk...@fb.com>
Date:   2016-04-15T23:44:23Z

    [SPARK-14649][CORE] DagScheduler should not run duplicate tasks on fetch failure

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @kayousterhout - Our use case is very large workload on Spark. We are processing around 100TBs of data in a single Spark job with 100k tasks in it (BTW the single threaded DagScheduler is becoming the bottleneck for such large workload). Each individual task can run for more than an hour and the jobs run for more than 10 hours. Seeing few machine reboot during the job run is very common in this case. Now, if we break the task execution time, out of 1 hour we spend around 5 - 10 minutes in fetching the data and rest in actual execution. When a machine goes down, it is very common for us to see few tasks failure who are still in shuffle fetch phase, but other tasks which have already fetched the data are not effected.  Thats why we don't want already running tasks to rerun in case of fetch failure.
    
    Again, killing all the tasks in case of fetch failure is also a not good idea because it would just waste a lot of resources. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    >> Also, separately from what approach is used, how do you deal with the following: suppose map task 1 loses its output (e.g., the reducer where that task is located dies). Now, suppose reduce task A gets a fetch failure for map task 1, triggering map task 1 to be re-run. Meanwhile, reduce task B is still running. Now the re-run map task 1 completes and the scheduler launches the reduce phase again. Suppose after that happens, task B fails (this is the old task B, that started before the fetch failure) because it can't get the data from map task 1, but that's because it still has the old location for map task 1. My understanding is that, with the current code, that would cause the map stage to get re-triggered again, but really, reduce task B should be re-started with the correct location for the output from map 1.
    
    @kayousterhout  -How do you think we can handle this issue?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-217337519
  
    No, this is not the same issue. SPARK-14915  deals with duplicate tasks in case of Speculation. But this change has nothing to do with Speculation. This is fixing the issue of running duplicate tasks in case of fetch failure. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @davies - Thanks for looking into this.  Updated the PR description with details of the change. Let me know if the approach seem reasonable, I will work on rebasing the change against latest master. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-222558428
  
    @sitalkedia can you update this to resolve the merge conflicts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-217102797
  
    This would need a rebase; is it the same as https://issues.apache.org/jira/browse/SPARK-14915 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    Yeah @mridulm that also seems like an issue with this approach.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @rxin - The idea is not to rerun or kill already running tasks in case of fetch failure because they might finish. If those tasks end up failing later, the dag scheduler will rerun them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-210690425
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by markhamstra <gi...@git.apache.org>.
Github user markhamstra commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    See https://issues.apache.org/jira/browse/SPARK-17064


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @kayousterhout  - Thanks for taking a look at the PR. Currently I don't have time to work on it. I will close the PR and open a new PR with issues addressed. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by mridulm <gi...@git.apache.org>.
Github user mridulm commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    
    I am curious how this is resilient to epoch changes which will be triggered due to executor loss for a shuffle task when its shuffle map task executor is gone.
    Wont it not create issues if we are trying to continue to (re)use the earlier stage @rxin @kayousterhout ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12436#discussion_r61660573
  
    --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala ---
    @@ -1259,11 +1282,6 @@ class DAGScheduler(
             val failedStage = stageIdToStage(task.stageId)
             val mapStage = shuffleToMapStage(shuffleId)
     
    -        if (failedStage.latestInfo.attemptId != task.stageAttemptId) {
    --- End diff --
    
    Please note that after this change, we can not ignore fetch failure from the previous attempt because we don't have duplicate tasks for fetch failed tasks anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @sitalkedia Have a quick look at this one, the use case sounds good, we should improve the stability for long running tasks. Could you explain a bit more how the current patch works? (in the PR description).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-218474544
  
    Found an issue of job being stuck in a corner case, fixed it and added a test case as well. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-222590636
  
    @kayousterhout - Sure, I will resolve the conflicts. Can you take a cursory look at the diff and let me know if the approach is reasonable? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @sitalkedia What's the use case for this?  In the cases I've seen, if there's one fetch failure, it typically means that a machine that ran a map task has failed / gone done / been revoked by the cluster manager, and as a result, *none* of the reduce tasks will succeed.  As a result, the tasks from the first attempt of the reduce stage fail eventually, because they require the output that's being re-computed in the map phase.  Why isn't this happening in the cases you're seeing?
    
    I do think it would be worthwhile to implement the TODO in TaskSetManager.abort (which says we should kill running tasks), which would be a simpler fix to avoid the duplicate tasks (but I'm wondering if there's some reason you're seeing that the still-running tasks might actually succeed?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @sitalkedia I was thinking about this over the weekend and I'm not sure this is the right approach.  I suspect it might be better to re-use the same task set manager for the new stage.  This copying of information is confusing and I'm concerned it will be bug-prone in the future.  Did you consider that approach?
    
    Also, separately from what approach is used, how do you deal with the following: suppose map task 1 loses its output (e.g., the reducer where that task is located dies).  Now, suppose reduce task A gets a fetch failure for map task 1, triggering map task 1 to be re-run.  Meanwhile, reduce task B is still running.  Now the re-run map task 1 completes and the scheduler launches the reduce phase again.  Suppose after that happens, task B fails (this is the old task B, that started before the fetch failure) because it can't get the data from map task 1, but that's because it still has the old location for map task 1.  My understanding is that, with the current code, that would cause the map stage to get re-triggered again, but really, reduce task B should be re-started with the correct location for the output from map 1.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #12436: [SPARK-14649][CORE] DagScheduler should not run d...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia closed the pull request at:

    https://github.com/apache/spark/pull/12436


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    ping. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    ping.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by kayousterhout <gi...@git.apache.org>.
Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @sitalkedia this has been inactive for a while and there were a few issues pointed out above that haven't yet been resolved.  Do you have time to work on this? Otherwise, can you close the PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-217045719
  
    Can someone take a look? 
    cc - @srowen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-211496262
  
    @kayousterhout


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by jisookim0513 <gi...@git.apache.org>.
Github user jisookim0513 commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @sitalkedia have you had a chance to work on this issue and open a new PR?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    Is the idea here to not rerun jobs that are already running in the case of a fetch failure, because they might finish?
    
    What happens after the change if those tasks end up coming back as failures?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-14649][CORE] DagScheduler should not ru...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the pull request:

    https://github.com/apache/spark/pull/12436#issuecomment-214402062
  
    I found a bug in my change and the job was stuck because of that. I am going to fix the issue (with an updated test case to test the scenario) and update the PR. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Posted by sitalkedia <gi...@git.apache.org>.
Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @jisookim0513 - created a new PR - https://github.com/apache/spark/pull/17297


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org