You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:37:37 UTC
[jira] [Resolved] (SPARK-9503) Mesos dispatcher
NullPointerException (MesosClusterScheduler)
[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-9503.
---------------------------------
Resolution: Incomplete
> Mesos dispatcher NullPointerException (MesosClusterScheduler)
> -------------------------------------------------------------
>
> Key: SPARK-9503
> URL: https://issues.apache.org/jira/browse/SPARK-9503
> Project: Spark
> Issue Type: Bug
> Components: Mesos
> Affects Versions: 1.4.1
> Environment: branch-1.4 #8dfdca46dd2f527bf653ea96777b23652bc4eb83
> Reporter: Sebastian YEPES FERNANDEZ
> Priority: Major
> Labels: bulk-closed, mesosphere
>
> Hello,
> I have just started using start-mesos-dispatcher and have been noticing that some random crashes NPE's
> By looking at the exception it looks like in certain situations the "queuedDrivers" is empty and causes the NPE "submission.cores"
> https://github.com/apache/spark/blob/branch-1.4/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L512-L516
> {code:title=log|borderStyle=solid}
> 15/07/30 23:56:44 INFO MesosRestServer: Started REST server for submitting applications on port 7077
> Exception in thread "Thread-1647" java.lang.NullPointerException
> at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:437)
> at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler$$anonfun$scheduleTasks$1.apply(MesosClusterScheduler.scala:436)
> at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.scheduleTasks(MesosClusterScheduler.scala:436)
> at org.apache.spark.scheduler.cluster.mesos.MesosClusterScheduler.resourceOffers(MesosClusterScheduler.scala:512)
> I0731 00:53:52.969518 7014 sched.cpp:1625] Asked to abort the driver
> I0731 00:53:52.969895 7014 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0000'
> 15/07/31 00:53:52 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
> {code}
> A side effect of this NPE is that after the crash the dispatcher will not start because its already registered #SPARK-7831
> {code:title=log|borderStyle=solid}
> 15/07/31 09:55:47 INFO MesosClusterUI: Started MesosClusterUI at http://192.168.0.254:8081
> I0731 09:55:47.715039 8162 sched.cpp:157] Version: 0.23.0
> I0731 09:55:47.717013 8163 sched.cpp:254] New master detected at master@192.168.0.254:5050
> I0731 09:55:47.717381 8163 sched.cpp:264] No credentials provided. Attempting to register without authentication
> I0731 09:55:47.718246 8177 sched.cpp:819] Got error 'Completed framework attempted to re-register'
> I0731 09:55:47.718268 8177 sched.cpp:1625] Asked to abort the driver
> 15/07/31 09:55:47 ERROR MesosClusterScheduler: Error received: Completed framework attempted to re-register
> I0731 09:55:47.719091 8177 sched.cpp:861] Aborting framework '20150730-234528-4261456064-5050-61754-0038'
> 15/07/31 09:55:47 INFO MesosClusterScheduler: driver.run() returned with code DRIVER_ABORTED
> 15/07/31 09:55:47 INFO Utils: Shutdown hook called
> {code}
> I can get around this by removing the zk data:
> {code:title=zkCli.sh|borderStyle=solid}
> rmr /spark_mesos_dispatcher
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org