You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2014/08/28 19:40:09 UTC
[jira] [Resolved] (SPARK-3150) NullPointerException in Spark
recovery after simultaneous fall of master and driver
[ https://issues.apache.org/jira/browse/SPARK-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen resolved SPARK-3150.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.0.3
1.1.1
> NullPointerException in Spark recovery after simultaneous fall of master and driver
> -----------------------------------------------------------------------------------
>
> Key: SPARK-3150
> URL: https://issues.apache.org/jira/browse/SPARK-3150
> Project: Spark
> Issue Type: Bug
> Components: Spark Core
> Affects Versions: 1.0.2
> Environment: Linux 3.2.0-23-generic x86_64
> Reporter: Tatiana Borisova
> Fix For: 1.1.1, 1.0.3
>
>
> The issue happens when Spark is run standalone on a cluster.
> When master and driver fall simultaneously on one node in a cluster, master tries to recover its state and restart spark driver.
> While restarting driver, it falls with NPE exception (stacktrace is below).
> After falling, it restarts and tries to recover its state and restart Spark driver again. It happens over and over in an infinite cycle.
> Namely, Spark tries to read DriverInfo state from zookeeper, but after reading it happens to be null in DriverInfo.worker.
> Stacktrace (on version 1.0.0, but reproduceable on version 1.0.2, too)
> 2014-08-14 21:44:59,519] ERROR (akka.actor.OneForOneStrategy)
> java.lang.NullPointerException
> at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
> at org.apache.spark.deploy.master.Master$$anonfun$completeRecovery$5.apply(Master.scala:448)
> at scala.collection.TraversableLike$$anonfun$filter$1.apply(TraversableLike.scala:264)
> at scala.collection.mutable.HashSet.foreach(HashSet.scala:79)
> at scala.collection.TraversableLike$class.filter(TraversableLike.scala:263)
> at scala.collection.AbstractTraversable.filter(Traversable.scala:105)
> at org.apache.spark.deploy.master.Master.completeRecovery(Master.scala:448)
> at org.apache.spark.deploy.master.Master$$anonfun$receive$1.applyOrElse(Master.scala:376)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
> at akka.actor.ActorCell.invoke(ActorCell.scala:456)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
> at akka.dispatch.Mailbox.run(Mailbox.scala:219)
> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> How to reproduce: kill all Spark processes when running Spark standalone on a cluster on some cluster node, where driver runs (kill driver, master and worker simultaneously).
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org