You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liyichao <gi...@git.apache.org> on 2017/05/23 10:05:40 UTC

[GitHub] spark pull request #18070: Convert CommitDenied to TaskKilled.

GitHub user liyichao opened a pull request:

    https://github.com/apache/spark/pull/18070

    Convert CommitDenied to TaskKilled.

    ## What changes were proposed in this pull request?
    
    In executor, `CommitDeniedException` is converted to `TaskKilledException` to avoid the inconsistency of taskState because there exists a race between when the driver kills and when the executor tries to commit.
    
    ## How was this patch tested?
    
    No tests because it is straightforward.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liyichao/spark SPARK-20713

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18070.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18070
    
----
commit 3467d5d2c8dc60654575d4aecd2771a74a0f3fea
Author: Li Yichao <ly...@zhihu.com>
Date:   2017-05-23T09:39:17Z

    Convert CommitDenied to TaskKilled.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    sorry for my delay on getting back to this.
    So if we do that you would have to have taskKilledReason extend TaskFailedReason so because things rely on the countTowardsTaskFailures field. Then we would want to go through everywhere those are used and see if we need to update the other code. It doesn't look like there would be to many places to update so I think this approach sounds good.  If you find something that its painful to update then just updating JobProgressListener in a few place would be fine also.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

Posted by ericl <gi...@git.apache.org>.
Github user ericl commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18070#discussion_r118628471
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -338,6 +340,9 @@ private[spark] class Executor(
                 metricsSystem = env.metricsSystem)
               threwException = false
               res
    +        } catch {
    +          case _: CommitDeniedException =>
    +            throw new TaskKilledException("commit denied")
    --- End diff --
    
    Doesn't a stage abort also cause tasks to show up as killed (due to "stage cancelled"?) https://github.com/apache/spark/blob/95aef660b73ec931e746d1ec8ae7848762ba0d7c/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala#L1531
    
    It seems to me that CommitDenied always implies the task is killed, in which case it might be fine to convert all CommitDeniedExceptions into TaskKilled.
    
    Btw, there's a catch block below -- `case CausedBy(cDE: CommitDeniedException) =>` which seems like the right place to be doing this handling.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao closed the pull request at:

    https://github.com/apache/spark/pull/18070


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    I will update the pr in a day. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    sorry the case I was talking about is with a fetch failure. The true abort stage doesn't happen until it retries 4 times.  in that mean time you can have tasks from the same stage (different attempts) running at the same time because we currently don't kill the tasks from the aborted stage.  Although thinking about that more having them show up as killed doesn't hurt anything just making a bit bigger assumption.  



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    How about Letting TaskCommitDenied and TaskKilled extend a same trait (for example, TaskKilledReason)? This way when accounting metrics, TaskCommitDenied and TaskKilled are all contributing to taskKilled and not TaskFailed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18070#discussion_r117977100
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -338,6 +340,9 @@ private[spark] class Executor(
                 metricsSystem = env.metricsSystem)
               threwException = false
               res
    +        } catch {
    +          case _: CommitDeniedException =>
    +            throw new TaskKilledException("commit denied")
    --- End diff --
    
    I'm not sure we want to just convert it here. I was originally thinking we would fix it up on the driver side because it knows that it explicitly a speculative task and it killed it.  Here we don't know that for sure.
    
    for instance you might have got a commit denied because the stage was aborted due to fetch failure.  That shouldn't show up as killed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18070#discussion_r118711377
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -459,7 +459,7 @@ private[spark] class Executor(
             case CausedBy(cDE: CommitDeniedException) =>
               val reason = cDE.toTaskFailedReason
    --- End diff --
    
    we should probably change this to be toTaskCommitDeniedReason since its not failed anymore.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18070: [SPARK-20713][Spark Core] Convert CommitDenied to...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18070#discussion_r118050668
  
    --- Diff: core/src/main/scala/org/apache/spark/executor/Executor.scala ---
    @@ -338,6 +340,9 @@ private[spark] class Executor(
                 metricsSystem = env.metricsSystem)
               threwException = false
               res
    +        } catch {
    +          case _: CommitDeniedException =>
    +            throw new TaskKilledException("commit denied")
    --- End diff --
    
    Maybe we should throw a more specific exception for the `already committed` case ? Executor can know about the `already committed` case before it sends the statusUpdate, so 
    we do not need to wait until the driver's statusUpdate.
    
    It also seems ok to fix it up on the driver, I'll try it tomorrow.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    ping @liyichao Will you address the latest comments from @tgravescs ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    cc @ericl 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    there is actually another pull request  up that does this same thing: 
     https://github.com/apache/spark/pull/18819


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    ping @tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by liyichao <gi...@git.apache.org>.
Github user liyichao commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    Oh, I did not notice that, since @nlyu follows up, I will close this pr now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18070: [SPARK-20713][Spark Core] Convert CommitDenied to TaskKi...

Posted by tgravescs <gi...@git.apache.org>.
Github user tgravescs commented on the issue:

    https://github.com/apache/spark/pull/18070
  
    thanks for the udpates. I was testing this out by running large job with speculative tasks and I am still seeing the stage summary show failed tasks.  It looks like its due to this code:
    https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/ui/jobs/JobProgressListener.scala#L396
    
    Where it doesn't differentiate the commit denied message so i think we need to handle it there so the stats show up properly.
    
    It would also be good to add a unit test for that where you can look at the stageData to make sure the numFailedTasks is what you expect.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org