You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by seglo <wl...@gmail.com> on 2015/03/29 21:12:42 UTC

Can't run spark-submit with an application jar on a Mesos cluster

Mesosphere did a great job on simplifying the process of running Spark on
Mesos. I am using this guide to setup a development Mesos cluster on Google
Cloud Compute.

https://mesosphere.com/docs/tutorials/run-spark-on-mesos/

I can run the example that's in the guide by using spark-shell (finding
numbers less than 10). However, when I attempt to submit an application that
otherwise works fine with Spark locally it blows up with TASK_FAILED
messages (i.e. CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_FAILED).

Here's the command I'm using with the provided Spark Pi example.

./spark-submit --class org.apache.spark.examples.SparkPi --master
mesos://10.173.40.36:5050
~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100

And the output:

jclouds@development-5159-d9:~/learning-spark$
~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050
~/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar 100
Spark assembly has been built with Hive, including Datanucleus jars on
classpath
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties
15/03/22 16:44:02 INFO SparkContext: Running Spark version 1.3.0
15/03/22 16:44:02 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
15/03/22 16:44:03 INFO SecurityManager: Changing view acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: Changing modify acls to: jclouds
15/03/22 16:44:03 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(jclouds); users
with modify permissions: Set(jclouds)
15/03/22 16:44:03 INFO Slf4jLogger: Slf4jLogger started
15/03/22 16:44:03 INFO Remoting: Starting remoting
15/03/22 16:44:03 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:60301]
15/03/22 16:44:03 INFO Utils: Successfully started service 'sparkDriver' on
port 60301.
15/03/22 16:44:03 INFO SparkEnv: Registering MapOutputTracker
15/03/22 16:44:03 INFO SparkEnv: Registering BlockManagerMaster
15/03/22 16:44:03 INFO DiskBlockManager: Created local directory at
/tmp/spark-27fad7e3-4ad7-44d6-845f-4a09ac9cce90/blockmgr-a558b7be-0d72-49b9-93fd-5ef8731b314b
15/03/22 16:44:03 INFO MemoryStore: MemoryStore started with capacity 265.0
MB
15/03/22 16:44:04 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-de9ac795-381b-4acd-a723-a9a6778773c9/httpd-7115216c-0223-492b-ae6f-4134ba7228ba
15/03/22 16:44:04 INFO HttpServer: Starting HTTP Server
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started
SocketConnector@0.0.0.0:36663
15/03/22 16:44:04 INFO Utils: Successfully started service 'HTTP file
server' on port 36663.
15/03/22 16:44:04 INFO SparkEnv: Registering OutputCommitCoordinator
15/03/22 16:44:04 INFO Server: jetty-8.y.z-SNAPSHOT
15/03/22 16:44:04 INFO AbstractConnector: Started
SelectChannelConnector@0.0.0.0:4040
15/03/22 16:44:04 INFO Utils: Successfully started service 'SparkUI' on port
4040.
15/03/22 16:44:04 INFO SparkUI: Started SparkUI at
http://development-5159-d9.c.learning-spark.internal:4040
15/03/22 16:44:04 INFO SparkContext: Added JAR
file:/home/jclouds/spark-1.3.0-bin-hadoop2.4/lib/spark-examples-1.3.0-hadoop2.4.0.jar
at http://10.173.40.36:36663/jars/spark-examples-1.3.0-hadoop2.4.0.jar with
timestamp 1427042644934
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
instead. Future releases will not support JNI bindings via
MESOS_NATIVE_LIBRARY.
Warning: MESOS_NATIVE_LIBRARY is deprecated, use MESOS_NATIVE_JAVA_LIBRARY
instead. Future releases will not support JNI bindings via
MESOS_NATIVE_LIBRARY.
I0322 16:44:05.035423   308 sched.cpp:137] Version: 0.21.1
I0322 16:44:05.038136   309 sched.cpp:234] New master detected at
master@10.173.40.36:5050
I0322 16:44:05.039261   309 sched.cpp:242] No credentials provided.
Attempting to register without authentication
I0322 16:44:05.040351   310 sched.cpp:408] Framework registered with
20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Registered as framework
ID 20150322-040336-606645514-5050-2744-0019
15/03/22 16:44:05 INFO NettyBlockTransferService: Server created on 44177
15/03/22 16:44:05 INFO BlockManagerMaster: Trying to register BlockManager
15/03/22 16:44:05 INFO BlockManagerMasterActor: Registering block manager
development-5159-d9.c.learning-spark.internal:44177 with 265.0 MB RAM,
BlockManagerId(<driver>, development-5159-d9.c.learning-spark.internal,
44177)
15/03/22 16:44:05 INFO BlockManagerMaster: Registered BlockManager
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 2 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 1 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 0 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio:
0.0
15/03/22 16:44:05 INFO SparkContext: Starting job: reduce at
SparkPi.scala:35
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now
TASK_RUNNING
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_RUNNING
15/03/22 16:44:05 INFO DAGScheduler: Got job 0 (reduce at SparkPi.scala:35)
with 100 output partitions (allowLocal=false)
15/03/22 16:44:05 INFO DAGScheduler: Final stage: Stage 0(reduce at
SparkPi.scala:35)
15/03/22 16:44:05 INFO DAGScheduler: Parents of final stage: List()
15/03/22 16:44:05 INFO DAGScheduler: Missing parents: List()
15/03/22 16:44:05 INFO DAGScheduler: Submitting Stage 0 (MapPartitionsRDD[1]
at map at SparkPi.scala:31), which has no missing parents
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 3 is now
TASK_FAILED
15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos slave
value: "20150322-040336-606645514-5050-2744-S1"
 due to too many failures; is Spark installed on it?
 15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 4 is now
TASK_FAILED
 15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: "20150322-040336-606645514-5050-2744-S0"
  due to too many failures; is Spark installed on it?
  15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1848) called with
curMem=0, maxMem=277842493
  15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0 stored as values in
memory (estimated size 1848.0 B, free 265.0 MB)
  15/03/22 16:44:05 INFO MemoryStore: ensureFreeSpace(1296) called with
curMem=1848, maxMem=277842493
  15/03/22 16:44:05 INFO MemoryStore: Block broadcast_0_piece0 stored as
bytes in memory (estimated size 1296.0 B, free 265.0 MB)
  15/03/22 16:44:05 INFO BlockManagerInfo: Added broadcast_0_piece0 in
memory on development-5159-d9.c.learning-spark.internal:44177 (size: 1296.0
B, free: 265.0 MB)
  15/03/22 16:44:05 INFO BlockManagerMaster: Updated info of block
broadcast_0_piece0
  15/03/22 16:44:05 INFO SparkContext: Created broadcast 0 from broadcast at
DAGScheduler.scala:839
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now
TASK_RUNNING
  15/03/22 16:44:05 INFO DAGScheduler: Submitting 100 missing tasks from
Stage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:31)
  15/03/22 16:44:05 INFO TaskSchedulerImpl: Adding task set 0.0 with 100
tasks
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Mesos task 5 is now
TASK_FAILED
  15/03/22 16:44:05 INFO CoarseMesosSchedulerBackend: Blacklisting Mesos
slave value: "20150322-040336-606645514-5050-2744-S2"
   due to too many failures; is Spark installed on it?
   15/03/22 16:44:20 WARN TaskSchedulerImpl: Initial job has not accepted
any resources; check your cluster UI to ensure that workers are registered
and have sufficient resources

I suspect it may have something to do with the mesos slave nodes not finding
the application jar, but when I put it in HDFS and provide the URL to it,
spark-submit tells me it will Skip remote jar.

jclouds@development-5159-d9:~/learning-spark$
~/spark-1.3.0-bin-hadoop2.4/bin/spark-submit --class
org.apache.spark.examples.SparkPi --master mesos://10.173.40.36:5050
hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar 100Spark
assembly has been built with Hive, including Datanucleus jars on classpath
Warning: Skip remote jar
hdfs://10.173.40.36/tmp/spark-examples-1.3.0-hadoop2.4.0.jar.
java.lang.ClassNotFoundException: org.apache.spark.examples.SparkPi
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:266)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:538)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Using Spark's default log4j profile:
org/apache/spark/log4j-defaults.properties

I created a StackOverflow question for this as well

http://stackoverflow.com/questions/29198522/cant-run-spark-submit-with-an-application-jar-on-a-mesos-cluster



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can't run spark-submit with an application jar on a Mesos cluster

Posted by Timothy Chen <ti...@mesosphere.io>.
I left a comment on your stackoverflow earlier. Can you share what's the output in your stderr log from your Mesos task? It
Can be found in your Mesos UI and going to its sandbox.

Tim

Sent from my iPhone

> On Mar 29, 2015, at 12:14 PM, seglo <wl...@gmail.com> wrote:
> 
> The latter part of this question where I try to submit the application by
> referring to it on HDFS is very similar to the recent question
> 
> Spark-submit not working when application jar is in hdfs
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html
> 
> 
> 
> --
> View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can't run spark-submit with an application jar on a Mesos cluster

Posted by seglo <wl...@gmail.com>.
Thanks hbogert.  There it is plain as day; it can't find my spark binaries. 
I thought it was enough to set SPARK_EXECUTOR_URI in my spark-env.sh since
this is all that's necessary to run spark-shell.sh against a mesos master,
but I also had to set spark.executor.uri in my spark-defaults.conf (or in my
app itself).  Thanks again for your help to troubleshoot this problem.

jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$
cat stderr
I0329 20:34:26.107267 10026 exec.cpp:132] Version: 0.21.1
I0329 20:34:26.109591 10031 exec.cpp:206] Executor registered on slave
20150322-040336-606645514-5050-2744-S1
sh: 1: /home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class: not found
jclouds@development-5159-d3d:/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/latest$
cat stdout
Registered executor on 10.217.7.180
Starting task 1
Forked command at 10036
sh -c ' "/home/jclouds/spark-1.3.0-bin-hadoop2.4/bin/spark-class"
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
akka.tcp://sparkDriver@development-5159-d9.c.learning-spark.internal:54746/user/CoarseGrainedScheduler
--executor-id 20150322-040336-606645514-5050-2744-S1 --hostname 10.217.7.180
--cores 10 --app-id 20150322-040336-606645514-5050-2744-0037'
Command exited with status 127 (pid: 10036)






--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22331.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can't run spark-submit with an application jar on a Mesos cluster

Posted by hbogert <ha...@gmail.com>.
Well that are only the logs of the slaves on mesos level,  I'm not sure from
your reply if you can ssh into a specific slave or not, if you can, you
should  look at actual output of the application (spark in this case) on a
slave in e.g.
 
/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948/std{err,out}

actual UUIDs, run number (in this example '4') in the path can differ from
slave-node to slave-node.

look into those stderr and stdout files and you'll probably have your answer
why it is failing.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22319.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can't run spark-submit with an application jar on a Mesos cluster

Posted by seglo <wl...@gmail.com>.
Thanks for the response.  I'll admit I'm rather new to Mesos.  Due to the
nature of my setup I can't use the Mesos web portal effectively because I'm
not connected by VPN, so the local network links from the mesos-master
dashboard I SSH tunnelled aren't working.

Anyway, I was able to dig up some logs for a failed job (framework?) run on
one of my slaves "20150322-040336-606645514-5050-2744-0037"

$ cat mesos-slave.INFO | grep 20150322-040336-606645514-5050-2744-0037

I0329 20:34:26.004115  2524 slave.cpp:1083] Got assigned task 1 for
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.004812  2524 slave.cpp:1193] Launching task 1 for framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.005879  2524 slave.cpp:3997] Launching executor 1 of
framework 20150322-040336-606645514-5050-2744-0037 in work directory
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/79cf96ba-bf58-45cd-927b-f6c864f6e44b'
I0329 20:34:26.006145  2524 slave.cpp:1316] Queuing task '1' for executor 1
of framework '20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.006722  2531 containerizer.cpp:424] Starting container
'79cf96ba-bf58-45cd-927b-f6c864f6e44b' for executor '1' of framework
'20150322-040336-606645514-5050-2744-0037'
I0329 20:34:26.089171  2529 slave.cpp:2840] Monitoring executor '1' of
framework '20150322-040336-606645514-5050-2744-0037' in container
'79cf96ba-bf58-45cd-927b-f6c864f6e44b'
I0329 20:34:26.108610  2529 slave.cpp:1860] Got registration for executor
'1' of framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.109136  2529 slave.cpp:1979] Flushing queued task 1 for
executor '1' of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.112584  2527 slave.cpp:2215] Handling status update
TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.112751  2527 status_update_manager.cpp:317] Received status
update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.113052  2527 slave.cpp:2458] Forwarding the update
TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.113131  2527 slave.cpp:2391] Sending acknowledgement for
status update TASK_RUNNING (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for
task 1 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:52410
I0329 20:34:26.115972  2527 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: 61e6f703-ae25-4e31-88a7-0464b8bd8249) for task
1 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.214292  2530 slave.cpp:2215] Handling status update
TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:52410
I0329 20:34:26.215005  2526 status_update_manager.cpp:317] Received status
update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.215144  2526 slave.cpp:2458] Forwarding the update
TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task 1 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.215277  2526 slave.cpp:2391] Sending acknowledgement for
status update TASK_FAILED (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for
task 1 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:52410
I0329 20:34:26.222218  2524 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: f91beb0f-3099-4313-97b7-25f7ff69913c) for task
1 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.239357  2524 slave.cpp:1083] Got assigned task 4 for
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.239853  2524 slave.cpp:1193] Launching task 4 for framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.240880  2524 slave.cpp:3997] Launching executor 4 of
framework 20150322-040336-606645514-5050-2744-0037 in work directory
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948'
I0329 20:34:26.241065  2524 slave.cpp:1316] Queuing task '4' for executor 4
of framework '20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.241554  2528 containerizer.cpp:424] Starting container
'e3cf195d-525b-4148-aa38-1789d378a948' for executor '4' of framework
'20150322-040336-606645514-5050-2744-0037'
I0329 20:34:26.292538  2527 slave.cpp:2840] Monitoring executor '4' of
framework '20150322-040336-606645514-5050-2744-0037' in container
'e3cf195d-525b-4148-aa38-1789d378a948'
I0329 20:34:26.313694  2527 slave.cpp:1860] Got registration for executor
'4' of framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:55646
I0329 20:34:26.314398  2527 slave.cpp:1979] Flushing queued task 4 for
executor '4' of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.324579  2531 slave.cpp:2215] Handling status update
TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:55646
I0329 20:34:26.324774  2527 status_update_manager.cpp:317] Received status
update TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task 4
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.325001  2531 slave.cpp:2458] Forwarding the update
TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.325150  2531 slave.cpp:2391] Sending acknowledgement for
status update TASK_RUNNING (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for
task 4 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:55646
I0329 20:34:26.328096  2529 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: 0a6624b9-74a2-44df-b1e9-007d89602e68) for task
4 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.425070  2526 slave.cpp:2215] Handling status update
TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 from
executor(1)@10.217.7.180:55646
I0329 20:34:26.425870  2526 status_update_manager.cpp:317] Received status
update TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task 4
of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:26.426008  2526 slave.cpp:2458] Forwarding the update
TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task 4 of
framework 20150322-040336-606645514-5050-2744-0037 to
master@10.173.40.36:5050
I0329 20:34:26.426118  2526 slave.cpp:2391] Sending acknowledgement for
status update TASK_FAILED (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for
task 4 of framework 20150322-040336-606645514-5050-2744-0037 to
executor(1)@10.217.7.180:55646
I0329 20:34:26.429636  2528 status_update_manager.cpp:389] Received status
update acknowledgement (UUID: e4a656cb-1be6-4875-b2bd-e2d756c78c11) for task
4 of framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.306196  2528 slave.cpp:2898] Executor '1' of framework
20150322-040336-606645514-5050-2744-0037 exited with status 0
I0329 20:34:27.306296  2528 slave.cpp:3007] Cleaning up executor '1' of
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.306550  2531 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1/runs/79cf96ba-bf58-45cd-927b-f6c864f6e44b'
for gc 6.99999645247704days in the future
I0329 20:34:27.306653  2531 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/1'
for gc 6.99999645160889days in the future
I0329 20:34:27.503298  2524 slave.cpp:2898] Executor '4' of framework
20150322-040336-606645514-5050-2744-0037 exited with status 0
I0329 20:34:27.503384  2524 slave.cpp:3007] Cleaning up executor '4' of
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.503510  2526 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4/runs/e3cf195d-525b-4148-aa38-1789d378a948'
for gc 6.99999417290667days in the future
I0329 20:34:27.503553  2524 slave.cpp:3084] Cleaning up framework
20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.503566  2526 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037/executors/4'
for gc 6.99999417236148days in the future
I0329 20:34:27.503608  2526 status_update_manager.cpp:279] Closing status
update streams for framework 20150322-040336-606645514-5050-2744-0037
I0329 20:34:27.503638  2524 gc.cpp:56] Scheduling
'/tmp/mesos/slaves/20150322-040336-606645514-5050-2744-S1/frameworks/20150322-040336-606645514-5050-2744-0037'
for gc 6.99999417116741days in the future
I0329 20:35:50.453316  2526 slave.cpp:1533] Asked to shut down framework
20150322-040336-606645514-5050-2744-0037 by master@10.173.40.36:5050
W0329 20:35:50.453419  2526 slave.cpp:1548] Cannot shut down unknown
framework 20150322-040336-606645514-5050-2744-0037
I0329 20:39:26.006376  2530 slave.cpp:3237] Framework
20150322-040336-606645514-5050-2744-0037 seems to have exited. Ignoring
registration timeout for executor '1'
I0329 20:39:26.241459  2524 slave.cpp:3237] Framework
20150322-040336-606645514-5050-2744-0037 seems to have exited. Ignoring
registration timeout for executor '4'

$ cat mesos-slave.WARNING | grep 20150322-040336-606645514-5050-2744-0037
W0329 20:35:50.453419  2526 slave.cpp:1548] Cannot shut down unknown
framework 20150322-040336-606645514-5050-2744-0037

There's nothing in mesos-slave.ERROR for this framework ID.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22282.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can't run spark-submit with an application jar on a Mesos cluster

Posted by hbogert <ha...@gmail.com>.
Hi, 

What do the mesos slave logs say? Usually this gives a clearcut error, they
are probably local on a slave node.

I'm not sure about your config, so I can;t pinpoint you to a specific path.

might look something like:

/???/mesos/slaves/20150213-092641-84118794-5050-14978-S0/frameworks/20150329-232522-84118794-5050-18181-0000/executors/5/runs/latest/stderr





--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22280.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Can't run spark-submit with an application jar on a Mesos cluster

Posted by seglo <wl...@gmail.com>.
The latter part of this question where I try to submit the application by
referring to it on HDFS is very similar to the recent question

Spark-submit not working when application jar is in hdfs
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-submit-not-working-when-application-jar-is-in-hdfs-td21840.html



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Can-t-run-spark-submit-with-an-application-jar-on-a-Mesos-cluster-tp22277p22278.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org