You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mesos.apache.org by Brian Belgodere <bb...@gmail.com> on 2015/01/20 19:11:28 UTC
Spark 1.20 resource issue with Mesos .21.1
Hi All,
I'm running into a weird issue with my test mesos cluster, I have a 3
master / 3 slave HA configuration. Marathon and Chronos are working as they
should and I can deploy dockerized applications to the slave nodes without
issue using Marathon. I downloaded Spark 1.2 and built from source.
Standalone mode works correctly but when I attempt to submit jobs to the
Mesos Cluster from Spark, it connects and shows up as a framework but I get
"Initial job has not accepted any resources; check your cluster UI to
ensure that workers are registered and have sufficient memory". I have
appended the relevant info believe below and I appreciate any help with
this. I've tried this in both coarse and fine grain and I get the same
result.
-Brian
I'm running on ubuntu trusty 64
my spark-env.sh contains
export MESOS_NATIVE_LIBRARY=/usr/local/lib/libmesos.so
export SPARK_EXECUTOR_URI=http://192.0.3.11:8081/spark-1.2.0.tgz
export MASTER=mesos://zk://192.0.3.11:2181,192.0.3.12:2181,
192.0.3.13:2181/mesos
export SPARK_WORKER_MEMORY=512M
export SPARK_WORKER_CORES=1
export SPARK_LOCAL_IP=192.0.3.11
My Mesos Cluster sees
*Cluster*:
Mesos_Cluster
*Server*:
192.0.3.12:5050
*Version*:
0.21.1
*Built*:
a week ago by root
*Started*:
2 hours ago
*Elected*:
2 hours ago
*Resources*
*CPUs*
*Mem*
*Total*
3
2.9 GB
*Used*
0
0 B
*Offered*
0
0 B
*Idle*
3
2.9 GB
In the Spark Log I see
vagrant@master1:~/spark-1.2.0$ ./bin/run-example SparkPi 3
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/01/19 02:41:40 INFO SecurityManager: Changing view acls to: vagrant
15/01/19 02:41:40 INFO SecurityManager: Changing modify acls to: vagrant
15/01/19 02:41:40 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(vagrant);
users with modify permissions: Set(vagrant)
15/01/19 02:41:41 INFO Slf4jLogger: Slf4jLogger started
15/01/19 02:41:41 INFO Remoting: Starting remoting
15/01/19 02:41:42 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@master1:56626]
15/01/19 02:41:42 INFO Utils: Successfully started service 'sparkDriver' on
port 56626.
15/01/19 02:41:42 INFO SparkEnv: Registering MapOutputTracker
15/01/19 02:41:42 INFO SparkEnv: Registering BlockManagerMaster
15/01/19 02:41:42 INFO DiskBlockManager: Created local directory at
/tmp/spark-local-20150119024142-16af
15/01/19 02:41:42 INFO MemoryStore: MemoryStore started with capacity 267.3
MB
15/01/19 02:41:42 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-80342d7e-780f-4550-933d-adce88265322
15/01/19 02:41:42 INFO HttpServer: Starting HTTP Server
15/01/19 02:41:42 INFO Utils: Successfully started service 'HTTP file
server' on port 36273.
15/01/19 02:41:43 INFO Utils: Successfully started service 'SparkUI' on
port 4040.
15/01/19 02:41:43 INFO SparkUI: Started SparkUI at http://master1:4040
15/01/19 02:41:43 INFO SparkContext: Added JAR
file:/home/vagrant/spark-1.2.0/examples/target/scala-2.10/spark-examples-1.2.0-hadoop1.0.4.jar
at http://192.0.3.11:36273/jars/spark-examples-1.2.0-hadoop1.0.4.jar with
timestamp 1421635303639
2015-01-19 02:41:44,069:19208(0x7f7da54b3700):ZOO_INFO@log_env@712: Client
environment:zookeeper.version=zookeeper C client 3.4.5
2015-01-19 02:41:44,070:19208(0x7f7da54b3700):ZOO_INFO@log_env@716: Client
environment:host.name=master1
2015-01-19 02:41:44,070:19208(0x7f7da54b3700):ZOO_INFO@log_env@723: Client
environment:os.name=Linux
2015-01-19 02:41:44,071:19208(0x7f7da54b3700):ZOO_INFO@log_env@724: Client
environment:os.arch=3.13.0-43-generic
2015-01-19 02:41:44,071:19208(0x7f7da54b3700):ZOO_INFO@log_env@725: Client
environment:os.version=#72-Ubuntu SMP Mon Dec 8 19:35:06 UTC 2014
2015-01-19 02:41:44,072:19208(0x7f7da54b3700):ZOO_INFO@log_env@733: Client
environment:user.name=vagrant
2015-01-19 02:41:44,072:19208(0x7f7da54b3700):ZOO_INFO@log_env@741: Client
environment:user.home=/home/vagrant
2015-01-19 02:41:44,073:19208(0x7f7da54b3700):ZOO_INFO@log_env@753: Client
environment:user.dir=/home/vagrant/spark-1.2.0
2015-01-19 02:41:44,073:19208(0x7f7da54b3700):ZOO_INFO@zookeeper_init@786:
Initiating client connection, host=192.0.3.11:2181,192.0.3.12:2181,
192.0.3.13:2181sessionTimeout=10000 watcher=0x7f7daa4516a0 sessionId=0
sessionPasswd=<null> context=0xcf0a60 flags=0
2015-01-19 02:41:44,077:19208(0x7f7da3cb0700):ZOO_INFO@check_events@1703:
initiated connection to server [192.0.3.13:2181]
2015-01-19 02:41:44,080:19208(0x7f7da3cb0700):ZOO_INFO@check_events@1750:
session establishment complete on server [192.0.3.13:2181],
sessionId=0x34aff9e627f000e, negotiated timeout=10000
I0119 02:41:44.082293 19313 sched.cpp:137] Version: 0.21.1
I0119 02:41:44.088546 19315 group.cpp:313] Group process (group(1)@
192.0.3.11:50317) connected to ZooKeeper
I0119 02:41:44.088948 19315 group.cpp:790] Syncing group operations: queue
size (joins, cancels, datas) = (0, 0, 0)
I0119 02:41:44.089274 19315 group.cpp:385] Trying to create path '/mesos'
in ZooKeeper
I0119 02:41:44.112208 19320 detector.cpp:138] Detected a new leader:
(id='2')
I0119 02:41:44.113049 19315 group.cpp:659] Trying to get
'/mesos/info_0000000002' in ZooKeeper
I0119 02:41:44.115067 19316 detector.cpp:433] A new leading master (UPID=
master@192.0.3.12:5050) is detected
I0119 02:41:44.118728 19317 sched.cpp:234] New master detected at
master@192.0.3.12:5050
I0119 02:41:44.119282 19317 sched.cpp:242] No credentials provided.
Attempting to register without authentication
I0119 02:41:44.123064 19317 sched.cpp:408] Framework registered with
20150119-003609-201523392-5050-7198-0002
15/01/19 02:41:44 INFO MesosSchedulerBackend: Registered as framework ID
20150119-003609-201523392-5050-7198-0002
15/01/19 02:41:44 INFO NettyBlockTransferService: Server created on 54462
15/01/19 02:41:44 INFO BlockManagerMaster: Trying to register BlockManager
15/01/19 02:41:44 INFO BlockManagerMasterActor: Registering block manager
master1:54462 with 267.3 MB RAM, BlockManagerId(<driver>, master1, 54462)
15/01/19 02:41:44 INFO BlockManagerMaster: Registered BlockManager
15/01/19 02:41:44 INFO SparkContext: Starting job: reduce at
SparkPi.scala:35
15/01/19 02:41:44 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35)
with 3 output partitions (allowLocal=false)
15/01/19 02:41:44 INFO DAGScheduler: Final stage: Stage 0(reduce at
SparkPi.scala:35)
15/01/19 02:41:44 INFO DAGScheduler: Parents of final stage: List()
15/01/19 02:41:44 INFO DAGScheduler: Missing parents: List()
15/01/19 02:41:44 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[1] at
map at SparkPi.scala:31), which has no missing parents
15/01/19 02:41:45 INFO MemoryStore: ensureFreeSpace(1728) called with
curMem=0, maxMem=280248975
15/01/19 02:41:45 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1728.0 B, free 267.3 MB)
15/01/19 02:41:45 INFO MemoryStore: ensureFreeSpace(1235) called with
curMem=1728, maxMem=280248975
15/01/19 02:41:45 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1235.0 B, free 267.3 MB)
15/01/19 02:41:45 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory
on master1:54462 (size: 1235.0 B, free: 267.3 MB)
15/01/19 02:41:45 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/01/19 02:41:45 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:838
15/01/19 02:41:45 INFO DAGScheduler: Submitting 3 missing tasks from Stage
0 (MappedRDD[1] at map at SparkPi.scala:31)
15/01/19 02:41:45 INFO TaskSchedulerImpl: Adding task set 0.0 with 3 tasks
15/01/19 02:42:00 WARN TaskSchedulerImpl: Initial job has not accepted any
resources; check your cluster UI to ensure that workers are registered and
have sufficient memory
and it keeps repeating "Initial job has not accepted any resources; check
your cluster UI to ensure that workers are registered and have sufficient
memory"
I have verified that http://192.0.3.11:8081/spark-1.2.0.tgz is accessible
from all the slave nodes.
*My Spark Environment variables list *
*Environment*
*Runtime Information*
*Name ▾*
*Value*
Java Home
/usr/lib/jvm/java-7-openjdk-amd64/jre
Java Version
1.7.0_65 (Oracle Corporation)
Scala Version
version 2.10.4
*Spark Properties*
*Name*
*Value*
spark.app.id
20150119-003609-201523392-5050-7198-0005
spark.app.name
Spark Pi
spark.driver.host
master1
spark.driver.port
46107
spark.executor.id
driver
spark.fileserver.uri
http://192.0.3.11:55424
spark.jars
file:/home/vagrant/spark-1.2.0/examples/target/scala-2.10/spark-examples-1.2.0-hadoop1.0.4.jar
spark.master
mesos://zk://192.0.3.11:2181,192.0.3.12:2181,192.0.3.13:2181/mesos
spark.scheduler.mode
FIFO
spark.tachyonStore.folderName
spark-3dffd4bb-f23b-43f7-a498-54b401dc591b
*System Properties*
*Name*
*Value*
SPARK_SUBMIT
true
awt.toolkit
sun.awt.X11.XToolkit
file.encoding
UTF-8
file.encoding.pkg
sun.io
file.separator
/
java.awt.graphicsenv
sun.awt.X11GraphicsEnvironment
java.awt.printerjob
sun.print.PSPrinterJob
java.class.version
51.0
java.endorsed.dirs
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/endorsed
java.ext.dirs
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/ext:/usr/java/packages/lib/ext
java.home
/usr/lib/jvm/java-7-openjdk-amd64/jre
java.io.tmpdir
/tmp
java.library.path
/usr/java/packages/lib/amd64:/usr/lib/x86_64-linux-gnu/jni:/lib/x86_64-linux-gnu:/usr/lib/x86_64-linux-gnu:/usr/lib/jni:/lib:/usr/lib
java.runtime.name
OpenJDK Runtime Environment
java.runtime.version
1.7.0_65-b32
java.specification.name
Java Platform API Specification
java.specification.vendor
Oracle Corporation
java.specification.version
1.7
java.vendor
Oracle Corporation
java.vendor.url
http://java.oracle.com/
java.vendor.url.bug
http://bugreport.sun.com/bugreport/
java.version
1.7.0_65
java.vm.info
mixed mode
java.vm.name
OpenJDK 64-Bit Server VM
java.vm.specification.name
Java Virtual Machine Specification
java.vm.specification.vendor
Oracle Corporation
java.vm.specification.version
1.7
java.vm.vendor
Oracle Corporation
java.vm.version
24.65-b04
line.separator
os.arch
amd64
os.name
Linux
os.version
3.13.0-43-generic
path.separator
:
sun.arch.data.model
64
sun.boot.class.path
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/resources.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rt.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/sunrsasign.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jsse.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jce.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/charsets.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/rhino.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/jfr.jar:/usr/lib/jvm/java-7-openjdk-amd64/jre/classes
sun.boot.library.path
/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64
sun.cpu.endian
little
sun.cpu.isalist
sun.io.unicode.encoding
UnicodeLittle
sun.java.command
org.apache.spark.deploy.SparkSubmit --master mesos://zk://192.0.3.11:2181,
192.0.3.12:2181,192.0.3.13:2181/mesos --class
org.apache.spark.examples.SparkPi
/home/vagrant/spark-1.2.0/examples/target/scala-2.10/spark-examples-1.2.0-hadoop1.0.4.jar
sun.java.launcher
SUN_STANDARD
sun.jnu.encoding
UTF-8
sun.management.compiler
HotSpot 64-Bit Tiered Compilers
sun.nio.ch.bugLevel
sun.os.patch.level
unknown
user.country
US
user.dir
/home/vagrant/spark-1.2.0
user.home
/home/vagrant
user.language
en
user.name
vagrant
user.timezone
Etc/UTC
*Classpath Entries*
*Resource*
*Source*
/home/vagrant/spark-1.2.0/assembly/target/scala-2.10/spark-assembly-1.2.0-hadoop1.0.4.jar
System Classpath
/home/vagrant/spark-1.2.0/conf
System Classpath
http://192.0.3.11:55424/jars/spark-examples-1.2.0-hadoop1.0.4.jar
Added By User