You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2014/08/28 19:40:09 UTC

[jira] [Resolved] (SPARK-3150) NullPointerException in Spark recovery after simultaneous fall of master and driver

     [ https://issues.apache.org/jira/browse/SPARK-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Josh Rosen resolved SPARK-3150.
-------------------------------

       Resolution: Fixed
    Fix Version/s: 1.0.3
                   1.1.1

> NullPointerException in Spark recovery after simultaneous fall of master and driver
> -----------------------------------------------------------------------------------
>
>                 Key: SPARK-3150
>                 URL: https://issues.apache.org/jira/browse/SPARK-3150
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2
>         Environment:  Linux 3.2.0-23-generic x86_64
>            Reporter: Tatiana Borisova
>             Fix For: 1.1.1, 1.0.3
>
>
> The issue happens when Spark is run standalone on a cluster.
> When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver.
> While restarting driver, it falls with NPE exception (stacktrace is below).
> After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle.
> Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker.
> Stacktrace (on version 1.0.0, but reproduceable on version 1.0.2, too)
> 2014-08-14 21:44:59,519] ERROR  (akka.actor.OneForOneStrategy)
> java.lang.NullPointerException
>         at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
>         at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
>         at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
>         at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
>         at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
>         at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
>         at org.apache.spark.deploy.master.Master.completeRecovery(Master.scala:448)
>         at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:376)
>         at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
>         at akka.actor.ActorCell.invoke(ActorCell.scala:456)
>         at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
>         at akka.dispatch.Mailbox.run(Mailbox.scala:219)
>         at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
>         at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>         at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>         at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>         at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> How to reproduce: kill all Spark processes when running Spark standalone on a cluster on some cluster node, where driver runs (kill driver, master and worker simultaneously).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org