You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Nathan Kronenfeld <nk...@oculusinfo.com> on 2014/02/25 06:21:05 UTC
spark failure

I'm using spark 0.8.1, and trying to run a job from a new remote client (it
works fine when run directly from the master).

When I try and run it, the job just fails without doing anything.

Unfortunately, I also can't find anywhere were it tells me why it fails.
 I'll add the bits of the logs below, but there really isn't much.

Does anyone know how to tell why it's failing? I assume it must be getting
an exception somewhere, but it isn't telling me about it.

On the client, I see:
14/02/24 23:44:43 INFO Client$ClientActor: Executor added:
app-20140224234441-0003/4 on
worker-20140224140443-hadoop-s2.oculus.local-40819
(hadoop-s2.oculus.local:7077) with 32 cores
14/02/24 23:44:43 INFO SparkDeploySchedulerBackend: Granted executor ID
app-20140224234441-0003/4 on hostPort hadoop-s2.oculus.local:7077 with 32
cores, 200.0 GB RAM
14/02/24 23:44:43 INFO Client$ClientActor: Executor updated:
app-20140224234441-0003/4 is now RUNNING
14/02/24 23:44:43 INFO FileInputFormat: Total input paths to process : 200
14/02/24 23:44:43 INFO Client$ClientActor: Executor updated:
app-20140224234441-0003/1 is now FAILED (Command exited with code 1)
14/02/24 23:44:43 INFO SparkDeploySchedulerBackend: Executor
app-20140224234441-0003/1 removed: Command exited with code 1

The master log just has:
14/02/24 23:44:43 INFO master.Master: Launching executor
app-20140224234441-0003/4 on worker
worker-20140224140443-hadoop-s2.oculus.local-40819
14/02/24 23:44:45 INFO master.Master: Removing executor
app-20140224234441-0003/4 because it is FAILED

(no other mention of 0003/4)

The client log has:
14/02/24 23:44:43 INFO worker.Worker: Asked to launch executor
app-20140224234441-0003/4 for Pyramid Binning(ndk)
14/02/24 23:44:43 INFO worker.ExecutorRunner: Launch command:
"/usr/java/jdk1.7.0_25-cloudera/bin/java" "-cp"
"math-utilities-0.2.jar:binning-utilities-0.2.jar:tile-generation-0.2.jar:hbase-client-0.95.2-cdh5.0.0-beta-1.jar:hbase-protocol-0.95.2-cdh5.0.0-beta-1.jar:hbase-common-0.95.2-cdh5.0.0-beta-1.jar:htrace-core-2.01.jar:avro-1.7.4.jar:commons-compress-1.4.1.jar:scala-library-2.9.3.jar:scala-compiler-2.9.3.jar:/opt/spark/conf:spark-assembly-0.8.1-incubating-hadoop2.2.0-mr1-cdh5.0.0-beta-1.jar"
"-Dspark.executor.memory=200G" "-Xms204800M" "-Xmx204800M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka://spark@hadoop-client.oculus.local:41101/user/CoarseGrainedScheduler"
"4" "hadoop-s2.oculus.local" "32" "app-20140224234441-0003"
14/02/24 23:44:45 INFO worker.Worker: Executor app-20140224234441-0003/4
finished with state FAILED message Command exited with code 1 exitStatus 1


Again, nothing else

-- 
Nathan Kronenfeld
Senior Visualization Developer
Oculus Info Inc
2 Berkeley Street, Suite 600,
Toronto, Ontario M5A 4J5
Phone:  +1-416-203-3003 x 238
Email:  nkronenfeld@oculusinfo.com