You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Matei Zaharia (JIRA)" <ji...@apache.org> on 2014/11/06 18:34:34 UTC

[jira] [Resolved] (SPARK-643) Standalone master crashes during actor restart

     [ https://issues.apache.org/jira/browse/SPARK-643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matei Zaharia resolved SPARK-643.
---------------------------------
    Resolution: Fixed

> Standalone master crashes during actor restart
> ----------------------------------------------
>
>                 Key: SPARK-643
>                 URL: https://issues.apache.org/jira/browse/SPARK-643
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 0.6.1
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>
> The standalone master will crash if it restarts due to an exception:
> {code}
> 12/12/15 03:10:47 ERROR master.Master: Job SkewBenchmark wth ID job-20121215031047-0000 failed 11 times.
> spark.SparkException: Job SkewBenchmark wth ID job-20121215031047-0000 failed 11 times.
>         at spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:103)
>         at spark.deploy.master.Master$$anonfun$receive$1.apply(Master.scala:62)
>         at akka.actor.Actor$class.apply(Actor.scala:318)
>         at spark.deploy.master.Master.apply(Master.scala:17)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:626)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:179)
>         at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
>         at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
>         at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
>         at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
>         at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
> 12/12/15 03:10:47 INFO master.Master: Starting Spark master at spark://ip-10-226-87-193:7077
> 12/12/15 03:10:47 INFO io.IoWorker: IoWorker thread 'spray-io-worker-1' started
> 12/12/15 03:10:47 ERROR master.Master: Failed to create web UI
> akka.actor.InvalidActorNameException:actor name HttpServer is not unique!
> [05aed000-4665-11e2-b361-12313d316833]
>         at akka.actor.ActorCell.actorOf(ActorCell.scala:392)
>         at akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.liftedTree1$1(ActorRefProvider.scala:394)
>         at akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.apply(ActorRefProvider.scala:394)
>         at akka.actor.LocalActorRefProvider$Guardian$$anonfun$receive$1.apply(ActorRefProvider.scala:392)
>         at akka.actor.Actor$class.apply(Actor.scala:318)
>         at akka.actor.LocalActorRefProvider$Guardian.apply(ActorRefProvider.scala:388)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:626)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:197)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:179)
>         at akka.dispatch.ForkJoinExecutorConfigurator$MailboxExecutionTask.exec(AbstractDispatcher.scala:516)
>         at akka.jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:259)
>         at akka.jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:975)
>         at akka.jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1479)
>         at akka.jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
> {code}
> When the Master actor restarts, Akka calls the {{postRestart}} hook.  [By default|http://doc.akka.io/docs/akka/snapshot/general/supervision.html#supervision-restart], this calls {{preStart}}.  The standalone master's {{preStart}} method tries to start the webUI but crashes because it is already running.
> I ran into this after a job failed more than 11 times, which causes the Master to throw a SparkException from its {{receive}} method.
> The solution is to implement a custom {{postRestart}} hook.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org