You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kayousterhout <gi...@git.apache.org> on 2016/07/01 02:54:52 UTC

[GitHub] spark issue #12436: [SPARK-14649][CORE] DagScheduler should not run duplicat...

Github user kayousterhout commented on the issue:

    https://github.com/apache/spark/pull/12436
  
    @sitalkedia What's the use case for this?  In the cases I've seen, if there's one fetch failure, it typically means that a machine that ran a map task has failed / gone done / been revoked by the cluster manager, and as a result, *none* of the reduce tasks will succeed.  As a result, the tasks from the first attempt of the reduce stage fail eventually, because they require the output that's being re-computed in the map phase.  Why isn't this happening in the cases you're seeing?
    
    I do think it would be worthwhile to implement the TODO in TaskSetManager.abort (which says we should kill running tasks), which would be a simpler fix to avoid the duplicate tasks (but I'm wondering if there's some reason you're seeing that the still-running tasks might actually succeed?).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org