You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2016/04/11 18:41:25 UTC

[jira] [Commented] (SPARK-14537) [CORE] SparkContext init hangs if master removes application before backend is ready.

    [ https://issues.apache.org/jira/browse/SPARK-14537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235449#comment-15235449 ] 

Apache Spark commented on SPARK-14537:
--------------------------------------

User 'drcrallen' has created a pull request for this issue:
https://github.com/apache/spark/pull/12301

> [CORE] SparkContext init hangs if master removes application before backend is ready.
> -------------------------------------------------------------------------------------
>
>                 Key: SPARK-14537
>                 URL: https://issues.apache.org/jira/browse/SPARK-14537
>             Project: Spark
>          Issue Type: Bug
>          Components: Scheduler
>    Affects Versions: 1.5.2
>            Reporter: Charles Allen
>
> During the course of the init of the spark context, the following code is executed.
> {code:scala}
>     setupAndStartListenerBus()
>     postEnvironmentUpdate()
>     postApplicationStart()
>     // Post init
>     _taskScheduler.postStartHook()
>     _env.metricsSystem.registerSource(new BlockManagerSource(_env.blockManager))
>     _executorAllocationManager.foreach { e =>
>       _env.metricsSystem.registerSource(e.executorAllocationManagerSource)
>     }
> {code}
> If the _taskScheduler.postStartHook() is waiting for a signal from the backend that it is ready, AND the driver is disconnected from the master scheduler due to a message similar to the one below:
> {code}
> ERROR [sparkDriver-akka.actor.default-dispatcher-20] org.apache.spark.rpc.akka.AkkaRpcEnv - Ignore error: Exiting due to error from cluster scheduler: Master removed our application: FAILED
> org.apache.spark.SparkException: Exiting due to error from cluster scheduler: Master removed our application: FAILED
> 	at org.apache.spark.scheduler.TaskSchedulerImpl.error(TaskSchedulerImpl.scala:431) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.scheduler.cluster.SparkDeploySchedulerBackend.dead(SparkDeploySchedulerBackend.scala:122) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.deploy.client.AppClient$ClientEndpoint.markDead(AppClient.scala:243) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.deploy.client.AppClient$ClientEndpoint$$anonfun$receive$1.applyOrElse(AppClient.scala:167) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$processMessage(AkkaRpcEnv.scala:177) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1$$anonfun$applyOrElse$4.apply$mcV$sp(AkkaRpcEnv.scala:126) ~[spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.rpc.akka.AkkaRpcEnv.org$apache$spark$rpc$akka$AkkaRpcEnv$$safelyCall(AkkaRpcEnv.scala:197) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1$$anonfun$receiveWithLogging$1.applyOrElse(AkkaRpcEnv.scala:125) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) [scala-library-2.10.5.jar:?]
> 	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) [scala-library-2.10.5.jar:?]
> 	at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) [scala-library-2.10.5.jar:?]
> 	at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:59) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at org.apache.spark.util.ActorLogReceive$$anon$1.apply(ActorLogReceive.scala:42) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) [scala-library-2.10.5.jar:?]
> 	at org.apache.spark.util.ActorLogReceive$$anon$1.applyOrElse(ActorLogReceive.scala:42) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at akka.actor.Actor$class.aroundReceive(Actor.scala:467) [akka-actor_2.10-2.3.11.jar:?]
> 	at org.apache.spark.rpc.akka.AkkaRpcEnv$$anonfun$actorRef$lzycompute$1$1$$anon$1.aroundReceive(AkkaRpcEnv.scala:92) [spark-core_2.10-1.5.2-mmx1.jar:1.5.2-mmx1]
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [akka-actor_2.10-2.3.11.jar:?]
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:487) [akka-actor_2.10-2.3.11.jar:?]
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka-actor_2.10-2.3.11.jar:?]
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10-2.3.11.jar:?]
> 	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) [akka-actor_2.10-2.3.11.jar:?]
> 	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.5.jar:?]
> 	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.5.jar:?]
> 	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.5.jar:?]
> 	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.5.jar:?]
> {code}
> Then the SparkContext will hang on init because the waiting for the backend to be ready never checks to make sure the context is still running:
> {code:scala|TaskSchedulerImpl.scala}
>   private def waitBackendReady(): Unit = {
>     if (backend.isReady) {
>       return
>     }
>     while (!backend.isReady) {
>       synchronized {
>         this.wait(100)
>       }
>     }
>   }
> {code}
> Proposed patch is at https://github.com/metamx/spark/pull/17/commits/bbb11ba088d30c30d083972ccc556fef0131c3bf
> I'll get a PR going shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org