You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by TJ Klein <TJ...@gmail.com> on 2014/11/02 02:51:55 UTC

org.apache.hadoop.security.UserGroupInformation.doAs Issue

Hi there,

I am trying to run the example code pi.py on a cluster, however, I only got
it working on localhost. When trying to run in standalone mode, 

./bin/spark-submit \
  --master spark://[mymaster]:7077 \
  examples/src/main/python/pi.py \

I get warnings about resources and memory (the workstation actually has
192GByte Memory and 32 cores).

14/11/01 21:37:05 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/11/01 21:37:05 INFO client.AppClient$ClientActor: Executor updated:
app-20141101213420-0000/4 is now EXITED (Command exited with code 1)
14/11/01 21:37:05 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20141101213420-0000/4 removed: Command exited with code 1
14/11/01 21:37:05 INFO client.AppClient$ClientActor: Executor added:
app-20141101213420-0000/5 on worker-20141101213345-localhost-33525
(localhost:33525) with 32 cores
14/11/01 21:37:05 INFO cluster.SparkDeploySchedulerBackend: Granted executor
ID app-20141101213420-0000/5 on hostPort localhost:33525 with 32 cores,
1024.0 MB RAM
14/11/01 21:37:05 INFO client.AppClient$ClientActor: Executor updated:
app-20141101213420-0000/5 is now RUNNING
14/11/01 21:37:20 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/11/01 21:37:35 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/11/01 21:37:38 INFO client.AppClient$ClientActor: Executor updated:
app-20141101213420-0000/5 is now EXITED (Command exited with code 1)
14/11/01 21:37:38 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20141101213420-0000/5 removed: Command exited with code 1
14/11/01 21:37:38 INFO client.AppClient$ClientActor: Executor added:
app-20141101213420-0000/6 on worker-20141101213345-localhost-33525
(localhost:33525) with 32 cores
14/11/01 21:37:38 INFO cluster.SparkDeploySchedulerBackend: Granted executor
ID app-20141101213420-0000/6 on hostPort localhost:33525 with 32 cores,
1024.0 MB RAM
14/11/01 21:37:38 INFO client.AppClient$ClientActor: Executor updated:
app-20141101213420-0000/6 is now RUNNING
14/11/01 21:37:50 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/11/01 21:38:05 WARN scheduler.TaskSchedulerImpl: Initial job has not
accepted any resources; check your cluster UI to ensure that workers are
registered and have sufficient memory
14/11/01 21:38:11 INFO client.AppClient$ClientActor: Executor updated:
app-20141101213420-0000/6 is now EXITED (Command exited with code 1)
14/11/01 21:38:11 INFO cluster.SparkDeploySchedulerBackend: Executor
app-20141101213420-0000/6 removed: Command exited with code 1
14/11/01 21:38:11 INFO client.AppClient$ClientActor: Executor added:
app-20141101213420-0000/7 on worker-20141101213345-localhost-33525
(localhost:33525) with 32 cores
14/11/01 21:38:11 INFO cluster.SparkDeploySchedulerBackend: Granted executor
ID app-20141101213420-0000/7 on hostPort localhost:33525 with 32 cores,
1024.0 MB RAM
14/11/01 21:38:11 INFO client.AppClient$ClientActor: Executor updated:
app-20141101213420-0000/7 is now RUNNING
[..]



The worker is connected successfully to the master and tries to run the
code: 

14/11/01 21:39:17 INFO worker.Worker: Asked to launch executor
app-20141101213420-0000/9 for PythonPi
14/11/01 21:39:17 WARN worker.CommandUtils: SPARK_JAVA_OPTS was set on the
worker. It is deprecated in Spark 1.0.
14/11/01 21:39:17 WARN worker.CommandUtils: Set SPARK_LOCAL_DIRS for
node-specific storage locations.
14/11/01 21:39:17 INFO worker.ExecutorRunner: Launch command:
"/usr/lib/jvm/java-6-openjdk-amd64/jre/bin/java" "-cp"
"::/etc/hadoop/spark/spark-1.1.0/conf:/etc/hadoop/spark/spark-1.1.0/assembly/target/scala-2.10/spark-assembly-1.1.0-hadoop2.5.1.jar:/etc/hadoop/conf"
"-XX:MaxPermSize=128m" "-verbose:gc" "-XX:+PrintGCDetails"
"-XX:+PrintGCTimeStamps" "-Dspark.akka.frameSize=32"
"-Dspark.driver.port=47509" "-verbose:gc" "-XX:+PrintGCDetails"
"-XX:+PrintGCTimeStamps" "-Xms1024M" "-Xmx1024M"
"org.apache.spark.executor.CoarseGrainedExecutorBackend"
"akka.tcp://sparkDriver@localhost:47509/user/CoarseGrainedScheduler" "9"
"localhost" "32" "akka.tcp://sparkWorker@localhost:33525/user/Worker"
"app-20141101213420-0000"
14/11/01 21:39:50 INFO worker.Worker: Executor app-20141101213420-0000/9
finished with state EXITED message Command exited with code 1 exitStatus 1


Looking at the working thread log file in
/spark-1.1.0/work/app-20141101213420-0000/[..]/stderr

14/11/01 21:38:46 INFO Remoting: Remoting started; listening on addresses
:[akka.tcp://driverPropsFetcher@localhost:52163]
14/11/01 21:38:46 INFO Remoting: Remoting now listens on addresses:
[akka.tcp://driverPropsFetcher@localhost:52163]
14/11/01 21:38:46 INFO util.Utils: Successfully started service
'driverPropsFetcher' on port 52163.
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1629)
        at
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:52)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:113)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:156)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[30 seconds]
        at
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
        at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
        at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
        at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
        at scala.concurrent.Await$.result(package.scala:107)
        at
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:125)
        at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:53)
        at
org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:52)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        ... 4 more


Does anybody have an idea how to resolve that? I am puzzled. The WebUI are
up and running and seem reasonable.

Best,
 Tassilo



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/org-apache-hadoop-security-UserGroupInformation-doAs-Issue-tp17897.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org