You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by pishen tsai <pi...@gmail.com> on 2013/08/02 05:59:00 UTC

Failed to submit task when using sbt run

Hi,

I have a standalone spark cluster running on VPS, and have successfully
submitted some jobs by typing scala commands in spark-shell.
Now I'm trying to submit a job from a sbt project and met some problems.

I use *$sbt package* to package a project on the spark master machine.
The files are as below:

*build.sbt*
name := "Simple Project"

version := "1.0"

scalaVersion := "2.9.3"

libraryDependencies += "org.spark-project" %% "spark-core" % "0.7.3"

libraryDependencies += "org.eclipse.jetty" % "jetty-server" %
"8.1.2.v20120308"

ivyXML :=
  <dependency org="org.eclipse.jetty.orbit" name="javax.servlet"
rev="3.0.0.v201112011016">
    <artifact name="javax.servlet" type="orbit" ext="jar"/>
  </dependency>

resolvers ++= Seq(
  "Akka Repository" at "http://repo.akka.io/releases/",
  "Spray Repository" at "http://repo.spray.cc/")

*SimpleJob.scala*
import spark.SparkContext
import SparkContext._

object SimpleJob {
  def main(args: Array[String]) {
    val sc = new SparkContext("spark://glare***.***.com:7077", "Pishen's
Test", "/opt/spark-0.7.2",
      List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
    val data = Array(1, 2, 3, 4, 5, 6, 7, 8, 9)
    val distData = sc.parallelize(data)
    println(distData.reduce(_ + _))
    System.exit(0)
  }
}

By running the "*run*" script provided in spark's home folder on the jar
file produced by the sbt project, the job can be submitted successfully.
But when running *$sbt run* directly under the project's base folder, I get
the following messages and the job failed:

[info] Set current project to Simple Project (in build
file:/home/pishen/simple-job/)
[info] Running SimpleJob
13/08/01 05:09:34 INFO slf4j.Slf4jEventHandler: Slf4jEventHandler started
13/08/01 05:09:34 INFO spark.SparkEnv: Registering BlockManagerMaster
13/08/01 05:09:34 INFO storage.MemoryStore: MemoryStore started with
capacity 971.5 MB.
13/08/01 05:09:34 INFO storage.DiskStore: Created local directory at
/tmp/spark-local-20130801050934-a238
13/08/01 05:09:34 INFO network.ConnectionManager: Bound socket to port
33351 with id = ConnectionManagerId(glarehair.***.com,33351)
13/08/01 05:09:34 INFO storage.BlockManagerMaster: Trying to register
BlockManager
13/08/01 05:09:34 INFO storage.BlockManagerMaster: Registered BlockManager
13/08/01 05:09:34 INFO server.Server: jetty-8.1.2.v20120308
13/08/01 05:09:34 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:44593
13/08/01 05:09:34 INFO broadcast.HttpBroadcast: Broadcast server started at
http://10.***.***.220:44593
13/08/01 05:09:34 INFO spark.SparkEnv: Registering MapOutputTracker
13/08/01 05:09:34 INFO spark.HttpFileServer: HTTP File server directory is
/tmp/spark-c1dc7d60-10a2-4842-bc2b-2d100d305f7c
13/08/01 05:09:34 INFO server.Server: jetty-8.1.2.v20120308
13/08/01 05:09:34 INFO server.AbstractConnector: Started
SocketConnector@0.0.0.0:52937
13/08/01 05:09:34 INFO io.IoWorker: IoWorker thread 'spray-io-worker-0'
started
13/08/01 05:09:35 INFO server.HttpServer:
akka://spark/user/BlockManagerHTTPServer started on /0.0.0.0:33253
13/08/01 05:09:35 INFO storage.BlockManagerUI: Started BlockManager web UI
at http://glarehair.***.com:33253
13/08/01 05:09:35 INFO spark.SparkContext: Added JAR
target/scala-2.9.3/simple-project_2.9.3-1.0.jar at
http://10.***.***.220:52937/jars/simple-project_2.9.3-1.0.jar
with timestamp 1375333775366
13/08/01 05:09:35 INFO client.Client$ClientActor: Connecting to master
spark://glarehair.***.com:7077
13/08/01 05:09:35 INFO cluster.SparkDeploySchedulerBackend: Connected to
Spark cluster with app ID app-20130801050935-0014
13/08/01 05:09:35 INFO client.Client$ClientActor: Executor added:
app-20130801050935-0014/0 on
worker-20130801012404-decideshides.corp.***.com-55546
(decideshides.***.com) with 2 cores
13/08/01 05:09:35 INFO cluster.SparkDeploySchedulerBackend: Granted
executor ID app-20130801050935-0014/0 on host decideshides.***.com with 2
cores, 512.0 MB RAM
...
13/08/01 05:09:35 INFO client.Client$ClientActor: Executor added:
app-20130801050935-0014/8 on
worker-20130801012409-standexpand.***.com-40599 (standexpand.***.com) with
2 cores
13/08/01 05:09:35 INFO cluster.SparkDeploySchedulerBackend: Granted
executor ID app-20130801050935-0014/8 on host standexpand.***.com with 2
cores, 512.0 MB RAM
13/08/01 05:09:35 INFO spark.SparkContext: Starting job: reduce at
SimpleJob.scala:10
13/08/01 05:09:35 INFO scheduler.DAGScheduler: Got job 0 (reduce at
SimpleJob.scala:10) with 2 output partitions (allowLocal=false)
13/08/01 05:09:35 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/1 is now RUNNING
13/08/01 05:09:35 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/6 is now RUNNING
13/08/01 05:09:35 INFO scheduler.DAGScheduler: Final stage: Stage 0
(parallelize at SimpleJob.scala:9)
13/08/01 05:09:35 INFO scheduler.DAGScheduler: Parents of final stage:
List()
13/08/01 05:09:35 INFO scheduler.DAGScheduler: Missing parents: List()
13/08/01 05:09:35 INFO scheduler.DAGScheduler: Submitting Stage 0
(ParallelCollectionRDD[0] at parallelize at SimpleJob.scala:9), which has
no missing parents
13/08/01 05:09:35 INFO scheduler.DAGScheduler: Submitting 2 missing tasks
from Stage 0 (ParallelCollectionRDD[0] at parallelize at SimpleJob.scala:9)
13/08/01 05:09:36 INFO cluster.ClusterScheduler: Adding task set 0.0 with 2
tasks
13/08/01 05:09:36 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/5 is now RUNNING
...
13/08/01 05:09:36 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/7 is now RUNNING
13/08/01 05:09:36 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/1 is now FAILED (Command exited with code 1)
13/08/01 05:09:36 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20130801050935-0014/1 removed: Command exited with code 1
13/08/01 05:09:36 INFO client.Client$ClientActor: Executor added:
app-20130801050935-0014/9 on
worker-20130801012400-assureobscure.***.com-55139 (assureobscure.***.com)
with 2 cores
13/08/01 05:09:36 INFO cluster.SparkDeploySchedulerBackend: Granted
executor ID app-20130801050935-0014/9 on host assureobscure.***.com with 2
cores, 512.0 MB RAM
13/08/01 05:09:36 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/9 is now RUNNING
...
13/08/01 05:09:37 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/12 is now RUNNING
13/08/01 05:09:37 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/9 is now FAILED (Command exited with code 1)
13/08/01 05:09:37 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20130801050935-0014/9 removed: Command exited with code 1
13/08/01 05:09:37 INFO client.Client$ClientActor: Executor added:
app-20130801050935-0014/17 on
worker-20130801012400-assureobscure.***.com-55139 (assureobscure.***.com)
with 2 cores
13/08/01 05:09:37 INFO cluster.SparkDeploySchedulerBackend: Granted
executor ID app-20130801050935-0014/17 on host assureobscure.***.com with 2
cores, 512.0 MB RAM
13/08/01 05:09:37 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/13 is now RUNNING
...
13/08/01 05:09:37 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/16 is now RUNNING
13/08/01 05:09:37 INFO client.Client$ClientActor: Executor updated:
app-20130801050935-0014/6 is now FAILED (Command exited with code 1)
13/08/01 05:09:37 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20130801050935-0014/6 removed: Command exited with code 1
13/08/01 05:09:37 ERROR client.Client$ClientActor: Master removed our
application: FAILED; stopping client
13/08/01 05:09:37 ERROR cluster.SparkDeploySchedulerBackend: Disconnected
from Spark cluster!
13/08/01 05:09:37 INFO scheduler.DAGScheduler: Failed to run reduce at
SimpleJob.scala:10
[error] (run-main) spark.SparkException: Job failed: Error: Disconnected
from Spark cluster
spark.SparkException: Job failed: Error: Disconnected from Spark cluster
    at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:642)
    at
spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:640)
    at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:60)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
    at spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:640)
    at spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:303)
    at
spark.scheduler.DAGScheduler.spark$scheduler$DAGScheduler$$run(DAGScheduler.scala:364)
    at spark.scheduler.DAGScheduler$$anon$1.run(DAGScheduler.scala:107)
13/08/01 05:09:37 INFO network.ConnectionManager: Selector thread was
interrupted!
java.lang.RuntimeException: Nonzero exit code: 1
    at scala.sys.package$.error(package.scala:27)
[error] {file:/home/pishen/simple-job/}default-918c10/compile:run: Nonzero
exit code: 1
[error] Total time: 5 s, completed Aug 1, 2013 5:09:37 AM

All the stdout and stderr on the spark master's web dashboard only show

The server could not handle the request in the appropriate time frame
(async timeout)

What component is missing when running by *$sbt run*?
I guess it's because of class path, and maybe other spark configuration
problems, but it's hard to see the exactly problem by reading through the
run script.

By the way, is it possible to submit the job from a machine other than
spark master? (It looks like it's possible since you already put the url of
master in SparkContext)

Thanks for the help in advance,
Pishen Tsai