You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by seglo <wl...@gmail.com> on 2015/03/29 21:12:42 UTC
Can't run spark-submit with an application jar on a Mesos cluster
Mesosphere did a great job on simplifying the process of running Spark on
Mesos. I am using this guide to setup a development Mesos cluster on Google
Cloud Compute.
https://mesosphere.com/docs/tutorials/run-spark-on-mesos/
I can run the example that's in the guide by using spark-shell (finding
numbers less than 10). However, when I attempt to submit an application that
otherwise works fine with Spark locally it blows up with TASK_FAILED
messages (i.e. CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_FAILED).
Here's the command I'm using with the provided Spark Pi example.
./spark-submit --class org.apache.spark.examples.SparkPi --master
mesos://10.173.40.36:5050
~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
And the output:
jclouds@development-5159-d9:~/learning-spark$
~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050
~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/22 16:44:02 INFO SparkContext: Running Spark version 1.3.0
15/03/22 16:44:02 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/03/22 16:44:03 INFO SecurityManager: Changing view acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: Changing modify acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(jclouds); users
with modify permissions: Set(jclouds)
15/03/22 16:44:03 INFO Slf4jLogger: Slf4jLogger started
15/03/22 16:44:03 INFO Remoting: Starting remoting
15/03/22 16:44:03 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:60301]
15/03/22 16:44:03 INFO Utils: Successfully started service 'sparkDriver' on
port 60301.
15/03/22 16:44:03 INFO SparkEnv: Registering MapOutputTracker
15/03/22 16:44:03 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 16:44:03 INFO DiskBlockManager: Created local directory at
/tmp/spark-27fad7e3-4ad7-44d6-845f-4a09ac9cce90/blockmgr-a558b7be-0d72-49b9-93fd-5ef8731b314b
15/03/22 16:44:03 INFO MemoryStore: MemoryStore started with capacity 265.0
MB
15/03/22 16:44:04 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-de9ac795-381b-4acd-a723-a9a6778773c9/httpd-7115216c-0223-492b-ae6f-4134ba7228ba
15/03/22 16:44:04 INFO HttpServer: Starting HTTP Server
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:36663
15/03/22 16:44:04 INFO Utils: Successfully started service 'HTTP file
server' on port 36663.
15/03/22 16:44:04 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/03/22 16:44:04 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/03/22 16:44:04 INFO SparkUI: Started SparkUI at
http://development-5159-d9.c.learning-spark.internal:4040
15/03/22 16:44:04 INFO SparkContext: Added JAR
file:/home/jclouds/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar
at http://10.173.40.36:36663/jars/spark-examples-1.3.0-hadoop2.4.0.jar with
timestamp 1427042644934
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
instead. Future releases will not support JNI bindings via
MESOS_NATIVE_LIBRARY.
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
instead. Future releases will not support JNI bindings via
MESOS_NATIVE_LIBRARY.
I0322 16:44:05.035423 308 sched.cpp:137] Version: 0.21.1
I0322 16:44:05.038136 309 sched.cpp:234] New master detected at
master@10.173.40.36:5050
I0322 16:44:05.039261 309 sched.cpp:242] No credentials provided.
Attempting to register without authentication
I0322 16:44:05.040351 310 sched.cpp:408] Framework registered with
20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Registered as framework
ID 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO NettyBlockTransferService: Server created on 44177
15/03/22 16:44:05 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 16:44:05 INFO BlockManagerMasterActor: Registering block manager
development-5159-d9.c.learning-spark.internal:44177 with 265.0 MB RAM,
BlockManagerId(<driver>, development-5159-d9.c.learning-spark.internal,
44177)
15/03/22 16:44:05 INFO BlockManagerMaster: Registered BlockManager
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
15/03/22 16:44:05 INFO SparkContext: Starting job: reduce at
SparkPi.scala:35
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35)
with 100 output partitions (allowLocal=false)
15/03/22 16:44:05 INFO DAGScheduler: Final stage: Stage 0(reduce at
SparkPi.scala:35)
15/03/22 16:44:05 INFO DAGScheduler: Parents of final stage: List()
15/03/22 16:44:05 INFO DAGScheduler: Missing parents: List()
15/03/22 16:44:05 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1]
at map at SparkPi.scala:31), which has no missing parents
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave
value: "20150322-040336-606645514-5050-2744-S1"
due to too many failures; is Spark installed on it?
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: "20150322-040336-606645514-5050-2744-S0"
due to too many failures; is Spark installed on it?
15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1848) called with
curMem=0, maxMem=277842493
15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1848.0 B, free 265.0 MB)
15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1296) called with
curMem=1848, maxMem=277842493
15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1296.0 B, free 265.0 MB)
15/03/22 16:44:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on development-5159-d9.c.learning-spark.internal:44177 (size: 1296.0
B, free: 265.0 MB)
15/03/22 16:44:05 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
15/03/22 16:44:05 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:839
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now
TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Submitting 100 missing tasks from
Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
15/03/22 16:44:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 100
tasks
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: "20150322-040336-606645514-5050-2744-S2"
due to too many failures; is Spark installed on it?
15/03/22 16:44:20 WARN TaskSchedulerImpl: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered
and have sufficient resources
I suspect it may have something to do with the mesos slave nodes not finding
the application jar, but when I put it in HDFS and provide the URL to it,
spark-submit tells me it will Skip remote jar.
jclouds@development-5159-d9:~/learning-spark$
~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050
hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar 100Spark
assembly has been built with Hive, including Datanucleus jars on classpath
Warning: Skip remote jar
hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:266)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
I created a StackOverflow question for this as well
http://stackoverflow.com/questions/29198522/cant-run-spark-submit-with-an-application-jar-on-a-mesos-cluster
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos cluster
Posted by Timothy Chen <ti...@mesosphere.io>.
I left a comment on your stackoverflow earlier. Can you share what's the output in your stderr log from your Mesos task? It
Can be found in your Mesos UI and going to its sandbox.
Tim
Sent from my iPhone
> On Mar 29, 2015, at 12:14 PM, seglo <wl...@gmail.com> wrote:
>
> The latter part of this question where I try to submit the application by
> referring to it on HDFS is very similar to the recent question
>
> Spark-submit not working when application jar is in hdfs
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html
>
>
>
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos
cluster
Posted by seglo <wl...@gmail.com>.
Thanks hbogert. There it is plain as day; it can't find my spark binaries.
I thought it was enough to set SPARK_EXECUTOR_URI in my spark-env.sh since
this is all that's necessary to run spark-shell.sh against a mesos master,
but I also had to set spark.executor.uri in my spark-defaults.conf (or in my
app itself). Thanks again for your help to troubleshoot this problem.
jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$
cat stderr
I0329 20:34:26.107267 10026 exec.cpp:132] Version: 0.21.1
I0329 20:34:26.109591 10031 exec.cpp:206] Executor registered on slave
20150322-040336-606645514-5050-2744-S1
sh: 1: /home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class: not found
jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$
cat stdout
Registered executor on 10.217.7.180
Starting task 1
Forked command at 10036
sh -c ' "/home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class"
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:54746/user/CoarseGrainedScheduler
--executor-id 20150322-040336-606645514-5050-2744-S1 --hostname 10.217.7.180
--cores 10 --app-id 20150322-040336-606645514-5050-2744-0037'
Command exited with status 127 (pid: 10036)
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22331.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos
cluster
Posted by hbogert <ha...@gmail.com>.
Well that are only the logs of the slaves on mesos level, I'm not sure from
your reply if you can ssh into a specific slave or not, if you can, you
should look at actual output of the application (spark in this case) on a
slave in e.g.
/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948/std{err,out}
actual UUIDs, run number (in this example '4') in the path can differ from
slave-node to slave-node.
look into those stderr and stdout files and you'll probably have your answer
why it is failing.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos
cluster
Posted by seglo <wl...@gmail.com>.
Thanks for the response. I'll admit I'm rather new to Mesos. Due to the
nature of my setup I can't use the Mesos web portal effectively because I'm
not connected by VPN, so the local network links from the mesos-master
dashboard I SSH tunnelled aren't working.
Anyway, I was able to dig up some logs for a failed job (framework?) run on
one of my slaves "20150322-040336-606645514-5050-2744-0037"
$ cat mesos-slave.INFO | grep 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.004115 2524 slave.cpp:1083] Got assigned task 1 for
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.004812 2524 slave.cpp:1193] Launching task 1 for framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.005879 2524 slave.cpp:3997] Launching executor 1 of
framework 20150322-040336-606645514-5050-2744-0037 in work directory
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/79cf96ba-bf58-45cd-927b-f6c864f6e44b'
I0329 20:34:26.006145 2524 slave.cpp:1316] Queuing task '1' for executor 1
of framework '20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.006722 2531 containerizer.cpp:424] Starting container
'79cf96ba-bf58-45cd-927b-f6c864f6e44b' for executor '1' of framework
'20150322-040336-606645514-5050-2744-0037'
I0329 20:34:26.089171 2529 slave.cpp:2840] Monitoring executor '1' of
framework '20150322-040336-606645514-5050-2744-0037' in container
'79cf96ba-bf58-45cd-927b-f6c864f6e44b'
I0329 20:34:26.108610 2529 slave.cpp:1860] Got registration for executor
'1' of framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.109136 2529 slave.cpp:1979] Flushing queued task 1 for
executor '1' of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.112584 2527 slave.cpp:2215] Handling status update
TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.112751 2527 status_update_manager.cpp:317] Received status
update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.113052 2527 slave.cpp:2458] Forwarding the update
TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.113131 2527 slave.cpp:2391] Sending acknowledgement for
status update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for
task 1 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:52410
I0329 20:34:26.115972 2527 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task
1 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.214292 2530 slave.cpp:2215] Handling status update
TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.215005 2526 status_update_manager.cpp:317] Received status
update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.215144 2526 slave.cpp:2458] Forwarding the update
TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.215277 2526 slave.cpp:2391] Sending acknowledgement for
status update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for
task 1 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:52410
I0329 20:34:26.222218 2524 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task
1 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.239357 2524 slave.cpp:1083] Got assigned task 4 for
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.239853 2524 slave.cpp:1193] Launching task 4 for framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.240880 2524 slave.cpp:3997] Launching executor 4 of
framework 20150322-040336-606645514-5050-2744-0037 in work directory
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948'
I0329 20:34:26.241065 2524 slave.cpp:1316] Queuing task '4' for executor 4
of framework '20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.241554 2528 containerizer.cpp:424] Starting container
'e3cf195d-525b-4148-aa38-1789d378a948' for executor '4' of framework
'20150322-040336-606645514-5050-2744-0037'
I0329 20:34:26.292538 2527 slave.cpp:2840] Monitoring executor '4' of
framework '20150322-040336-606645514-5050-2744-0037' in container
'e3cf195d-525b-4148-aa38-1789d378a948'
I0329 20:34:26.313694 2527 slave.cpp:1860] Got registration for executor
'4' of framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:55646
I0329 20:34:26.314398 2527 slave.cpp:1979] Flushing queued task 4 for
executor '4' of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.324579 2531 slave.cpp:2215] Handling status update
TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:55646
I0329 20:34:26.324774 2527 status_update_manager.cpp:317] Received status
update TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task 4
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.325001 2531 slave.cpp:2458] Forwarding the update
TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.325150 2531 slave.cpp:2391] Sending acknowledgement for
status update TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for
task 4 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:55646
I0329 20:34:26.328096 2529 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task
4 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.425070 2526 slave.cpp:2215] Handling status update
TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:55646
I0329 20:34:26.425870 2526 status_update_manager.cpp:317] Received status
update TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task 4
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.426008 2526 slave.cpp:2458] Forwarding the update
TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.426118 2526 slave.cpp:2391] Sending acknowledgement for
status update TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for
task 4 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:55646
I0329 20:34:26.429636 2528 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task
4 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.306196 2528 slave.cpp:2898] Executor '1' of framework
20150322-040336-606645514-5050-2744-0037 exited with status 0
I0329 20:34:27.306296 2528 slave.cpp:3007] Cleaning up executor '1' of
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.306550 2531 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/79cf96ba-bf58-45cd-927b-f6c864f6e44b'
for gc 6.99999645247704days in the future
I0329 20:34:27.306653 2531 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1'
for gc 6.99999645160889days in the future
I0329 20:34:27.503298 2524 slave.cpp:2898] Executor '4' of framework
20150322-040336-606645514-5050-2744-0037 exited with status 0
I0329 20:34:27.503384 2524 slave.cpp:3007] Cleaning up executor '4' of
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.503510 2526 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948'
for gc 6.99999417290667days in the future
I0329 20:34:27.503553 2524 slave.cpp:3084] Cleaning up framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.503566 2526 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4'
for gc 6.99999417236148days in the future
I0329 20:34:27.503608 2526 status_update_manager.cpp:279] Closing status
update streams for framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.503638 2524 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037'
for gc 6.99999417116741days in the future
I0329 20:35:50.453316 2526 slave.cpp:1533] Asked to shut down framework
20150322-040336-606645514-5050-2744-0037 by master@10.173.40.36:5050
W0329 20:35:50.453419 2526 slave.cpp:1548] Cannot shut down unknown
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:39:26.006376 2530 slave.cpp:3237] Framework
20150322-040336-606645514-5050-2744-0037 seems to have exited. Ignoring
registration timeout for executor '1'
I0329 20:39:26.241459 2524 slave.cpp:3237] Framework
20150322-040336-606645514-5050-2744-0037 seems to have exited. Ignoring
registration timeout for executor '4'
$ cat mesos-slave.WARNING | grep 20150322-040336-606645514-5050-2744-0037
W0329 20:35:50.453419 2526 slave.cpp:1548] Cannot shut down unknown
framework 20150322-040336-606645514-5050-2744-0037
There's nothing in mesos-slave.ERROR for this framework ID.
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22282.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos
cluster
Posted by hbogert <ha...@gmail.com>.
Hi,
What do the mesos slave logs say? Usually this gives a clearcut error, they
are probably local on a slave node.
I'm not sure about your config, so I can;t pinpoint you to a specific path.
might look something like:
/???/mesos/slaves/20150213-092641-84118794-5050-14978-S0/frameworks/20150329-232522-84118794-5050-18181-0000/executors/5/runs/latest/stderr
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Can't run spark-submit with an application jar on a Mesos
cluster
Posted by seglo <wl...@gmail.com>.
The latter part of this question where I try to submit the application by
referring to it on HDFS is very similar to the recent question
Spark-submit not working when application jar is in hdfs
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org