You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ranjanp <pi...@hotmail.com> on 2014/07/18 00:32:40 UTC
Error with spark-submit

Hi,I am new to Spark and trying out with a stand-alone, 3-node (1 master, 2
workers) cluster. From the Web UI at the master, I see that the workers are
registered. But when I try running the SparkPi example from the master node,
I get the following message and then an exception.14/07/17 01:20:36 INFO
AppClient$ClientActor: Connecting to master spark://10.1.3.7:7077...14/07/17
01:20:46 WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have
sufficient memoryI searched a bit for the above warning, and found and found
that others have encountered this problem before, but did not see a clear
resolution except for this link:
http://apache-spark-user-list.1001560.n3.nabble.com/TaskSchedulerImpl-Initial-job-has-not-accepted-any-resources-check-your-cluster-UI-to-ensure-that-woy-tt8247.html#a8444
Based on the suggestion there I tried supplying --executor-memory option to
spark-submit but that did not help.Any suggestions. Here are the details of
my set up. - 3 nodes (each with 4 CPU cores and 7 GB memory) - 1 node
configured as Master, and the other two configured as workers - Firewall is
disabled on all nodes, and network communication between the nodes is not a
problem - Edited the conf/spark-env.sh on all nodes to set the following:   
SPARK_WORKER_CORES=3    SPARK_WORKER_MEMORY=5G - The Web UI as well as logs
on master show that Workers were able to register correctly. Also the Web UI
correctly shows the aggregate available memory and CPU cores on the
workers:URL: spark://vmsparkwin1:7077Workers: 2Cores: 6 Total, 0 UsedMemory:
10.0 GB Total, 0.0 B UsedApplications: 0 Running, 0 CompletedDrivers: 0
Running, 0 CompletedStatus: ALIVEI try running the SparkPi example first
using the run-example (which was failing) and later directly using the
spark-submit as shown below:azureuser@vmsparkwin1
/cygdrive/c/opt/spark-1.0.0$ export MASTER=spark://vmsparkwin1:7077$ echo
$MASTERspark://vmsparkwin1:7077azureuser@vmsparkwin1
/cygdrive/c/opt/spark-1.0.0$ ./bin/spark-submit --class
org.apache.spark.examples.SparkPi --master spark://10.1.3.7:7077
--executor-memory 1G --total-executor-cores 2
./lib/spark-examples-1.0.0-hadoop2.2.0.jar 10The following is the full
screen output:14/07/17 01:20:13 INFO SecurityManager: Using Spark's default
log4j profile: org/apache/spark/log4j-defaults.properties14/07/17 01:20:13
INFO SecurityManager: Changing view acls to: azureuser14/07/17 01:20:13 INFO
SecurityManager: SecurityManager: authentication disabled; ui acls disabled;
users with view permissions: Set(azureuser)14/07/17 01:20:14 INFO
Slf4jLogger: Slf4jLogger started14/07/17 01:20:14 INFO Remoting: Starting
remoting14/07/17 01:20:14 INFO Remoting: Remoting started; listening on
addresses
:[akka.tcp://spark@vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49839]14/07/17
01:20:14 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://spark@vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49839]14/07/17
01:20:14 INFO SparkEnv: Registering MapOutputTracker14/07/17 01:20:14 INFO
SparkEnv: Registering BlockManagerMaster14/07/17 01:20:14 INFO
DiskBlockManager: Created local directory at
C:\cygwin\tmp\spark-local-20140717012014-b60614/07/17 01:20:14 INFO
MemoryStore: MemoryStore started with capacity 294.9 MB.14/07/17 01:20:14
INFO ConnectionManager: Bound socket to port 49842 with id =
ConnectionManagerId(vmsparkwin1.cssparkwin.b1.internal.cloudapp.net,49842)14/07/17
01:20:14 INFO BlockManagerMaster: Trying to register BlockManager14/07/17
01:20:14 INFO BlockManagerInfo: Registering block manager
vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:49842 with 294.9 MB
RAM14/07/17 01:20:14 INFO BlockManagerMaster: Registered
BlockManager14/07/17 01:20:14 INFO HttpServer: Starting HTTP Server14/07/17
01:20:14 INFO HttpBroadcast: Broadcast server started at
http://10.1.3.7:4984314/07/17 01:20:14 INFO HttpFileServer: HTTP File server
directory is
C:\cygwin\tmp\spark-6a076e92-53bb-4c7a-9e27-ce53a818146d14/07/17 01:20:14
INFO HttpServer: Starting HTTP Server14/07/17 01:20:15 INFO SparkUI: Started
SparkUI at
http://vmsparkwin1.cssparkwin.b1.internal.cloudapp.net:404014/07/17 01:20:15
WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable14/07/17 01:20:16
INFO SparkContext: Added JAR
file:/C:/opt/spark-1.0.0/./lib/spark-examples-1.0.0-hadoop2.2.0.jar at
http://10.1.3.7:49844/jars/spark-examples-1.0.0-hadoop2.2.0.jar with
timestamp 140556001631614/07/17 01:20:16 INFO AppClient$ClientActor:
Connecting to master spark://10.1.3.7:7077...14/07/17 01:20:16 INFO
SparkContext: Starting job: reduce at SparkPi.scala:3514/07/17 01:20:16 INFO
DAGScheduler: Got job 0 (reduce at SparkPi.scala:35) with 10 output
partitions (allowLocal=false)14/07/17 01:20:16 INFO DAGScheduler: Final
stage: Stage 0(reduce at SparkPi.scala:35)14/07/17 01:20:16 INFO
DAGScheduler: Parents of final stage: List()14/07/17 01:20:16 INFO
DAGScheduler: Missing parents: List()14/07/17 01:20:16 INFO DAGScheduler:
Submitting Stage 0 (MappedRDD[1] at map at SparkPi.scala:31), which has no
missing parents14/07/17 01:20:16 INFO DAGScheduler: Submitting 10 missing
tasks from Stage 0 (MappedRDD[1] at map at SparkPi.scala:31)14/07/17
01:20:16 INFO TaskSchedulerImpl: Adding task set 0.0 with 10 tasks14/07/17
01:20:31 WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have
sufficient memory14/07/17 01:20:36 INFO AppClient$ClientActor: Connecting to
master spark://10.1.3.7:7077...14/07/17 01:20:46 WARN TaskSchedulerImpl:
Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient memory14/07/17 01:20:56 INFO
AppClient$ClientActor: Connecting to master spark://10.1.3.7:7077...14/07/17
01:21:01 WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have
sufficient memory14/07/17 01:21:16 ERROR SparkDeploySchedulerBackend:
Application has been killed. Reason: All masters are unresponsive! Giving
up.14/07/17 01:21:16 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose
tasks have all completed, from pool14/07/17 01:21:16 INFO TaskSchedulerImpl:
Cancelling stage 014/07/17 01:21:16 INFO DAGScheduler: Failed to run reduce
at SparkPi.scala:35Exception in thread "main"
org.apache.spark.SparkException: Job aborted due to stage failure: All
masters are unresponsive! Giving up.        at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1033)       
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017)       
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015)       
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)       
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)       
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015)       
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)       
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:633)       
at scala.Option.foreach(Option.scala:236)        at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633)       
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1207)       
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)        at
akka.actor.ActorCell.invoke(ActorCell.scala:456)        at
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)        at
akka.dispatch.Mailbox.run(Mailbox.scala:219)        at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)       
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)       
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)       
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)       
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Error-with-spark-submit-tp10099.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.