You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Thomas Graves (JIRA)" <ji...@apache.org> on 2016/09/14 19:35:20 UTC

[jira] [Resolved] (SPARK-17511) Dynamic allocation race condition: Containers getting marked failed while releasing

     [ https://issues.apache.org/jira/browse/SPARK-17511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Graves resolved SPARK-17511.
-----------------------------------
       Resolution: Fixed
         Assignee: Kishor Patil
    Fix Version/s: 2.1.0
                   2.0.1

> Dynamic allocation race condition: Containers getting marked failed while releasing
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-17511
>                 URL: https://issues.apache.org/jira/browse/SPARK-17511
>             Project: Spark
>          Issue Type: Bug
>          Components: YARN
>    Affects Versions: 2.0.0, 2.0.1, 2.1.0
>            Reporter: Kishor Patil
>            Assignee: Kishor Patil
>             Fix For: 2.0.1, 2.1.0
>
>
> While trying to reach launch multiple containers in pool, if running executors count reaches or goes beyond the target running executors, the container is released and marked failed. This can cause many jobs to be marked failed causing overall job failure.
> I will have a patch up soon after completing testing.
> {panel:title=Typical Exception found in Driver marking the container to Failed}
> {code}
> java.lang.AssertionError: assertion failed
>         at scala.Predef$.assert(Predef.scala:156)
>         at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1.org$apache$spark$deploy$yarn$YarnAllocator$$anonfun$$updateInternalState$1(YarnAllocator.scala:489)
>         at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:519)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> {code}
> {panel}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org