You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Alexander Pivovarov <ap...@gmail.com> on 2015/08/28 00:07:38 UTC

TimeoutException on start-slave spark 1.4.0

I see the following error time to time when try to start slaves on spark
1.4.0


[hadoop@ip-10-0-27-240 apps]$ pwd
/mnt/var/log/apps

[hadoop@ip-10-0-27-240 apps]$ cat
spark-hadoop-org.apache.spark.deploy.worker.Worker-1-ip-10-0-27-240.ec2.internal.out
Spark Command: /usr/java/latest/bin/java -cp
/home/hadoop/spark/conf/:/home/hadoop/conf/:/home/hadoop/spark/classpath/distsupplied/*:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar:/usr/share/aws/emr/auxlib/*:/home/hadoop/.versions/spark-1.4.0.b/sbin/../conf/:/home/hadoop/.versions/spark-1.4.0.b/lib/spark-assembly-1.4.0-hadoop2.4.0.jar:/home/hadoop/.versions/spark-1.4.0.b/lib/datanucleus-core-3.2.10.jar:/home/hadoop/.versions/spark-1.4.0.b/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/.versions/spark-1.4.0.b/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/conf/:/home/hadoop/conf/
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
-XX:MaxHeapFreeRatio=70 -Xms2048m -Xmx2048m -XX:MaxPermSize=128m
org.apache.spark.deploy.worker.Worker --webui-port 8081
spark://ip-10-0-27-185.ec2.internal:7077
========================================
15/08/27 21:10:25 INFO Worker: Registered signal handlers for [TERM, HUP,
INT]
15/08/27 21:10:26 INFO SecurityManager: Changing view acls to: hadoop
15/08/27 21:10:26 INFO SecurityManager: Changing modify acls to: hadoop
15/08/27 21:10:26 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(hadoop); users
with modify permissions: Set(hadoop)
15/08/27 21:10:26 INFO Slf4jLogger: Slf4jLogger started
15/08/27 21:10:26 INFO Remoting: Starting remoting
Exception in thread "main" java.util.concurrent.TimeoutException: Futures
timed out after [10000 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:107)
at akka.remote.Remoting.start(Remoting.scala:180)
at akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
at
org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at
org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
at
org.apache.spark.deploy.worker.Worker$.startSystemAndActor(Worker.scala:553)
at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:533)
at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
15/08/27 21:10:39 INFO Utils: Shutdown hook called
Heap
 par new generation   total 613440K, used 338393K [0x0000000778000000,
0x00000007a1990000, 0x00000007a1990000)
  eden space 545344K,  62% used [0x0000000778000000, 0x000000078ca765b0,
0x0000000799490000)
  from space 68096K,   0% used [0x0000000799490000, 0x0000000799490000,
0x000000079d710000)
  to   space 68096K,   0% used [0x000000079d710000, 0x000000079d710000,
0x00000007a1990000)
 concurrent mark-sweep generation total 1415616K, used 0K
[0x00000007a1990000, 0x00000007f8000000, 0x00000007f8000000)
 concurrent-mark-sweep perm gen total 21248K, used 19285K
[0x00000007f8000000, 0x00000007f94c0000, 0x0000000800000000)

Re: TimeoutException on start-slave spark 1.4.0

Posted by Alexander Pivovarov <ap...@gmail.com>.
I have a workaround to the issue

As you can see from the log it is about 15 sec btw worker start and
shutdown.

The workaround might be to sleep 30 sec, check if worker is running and if
not try to start-slave again

part of emr spark bootstrap py script

spark_master = "spark://...:7077"
...
curl_worker_cmd = "curl -o /dev/null --silent --head --write-out
'%{http_code}' localhost:8081"
while True:
    subprocess.call(["/home/hadoop/spark/sbin/start-slave.sh",spark_master])
    time.sleep(30)
    if subprocess.Popen(curl_worker_cmd.split(" "),
stdout=subprocess.PIPE).communicate()[0] == "'200'":
        break

On Thu, Aug 27, 2015 at 3:07 PM, Alexander Pivovarov <ap...@gmail.com>
wrote:

> I see the following error time to time when try to start slaves on spark
> 1.4.0
>
>
> [hadoop@ip-10-0-27-240 apps]$ pwd
> /mnt/var/log/apps
>
> [hadoop@ip-10-0-27-240 apps]$ cat
> spark-hadoop-org.apache.spark.deploy.worker.Worker-1-ip-10-0-27-240.ec2.internal.out
> Spark Command: /usr/java/latest/bin/java -cp
> /home/hadoop/spark/conf/:/home/hadoop/conf/:/home/hadoop/spark/classpath/distsupplied/*:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar:/usr/share/aws/emr/auxlib/*:/home/hadoop/.versions/spark-1.4.0.b/sbin/../conf/:/home/hadoop/.versions/spark-1.4.0.b/lib/spark-assembly-1.4.0-hadoop2.4.0.jar:/home/hadoop/.versions/spark-1.4.0.b/lib/datanucleus-core-3.2.10.jar:/home/hadoop/.versions/spark-1.4.0.b/lib/datanucleus-rdbms-3.2.9.jar:/home/hadoop/.versions/spark-1.4.0.b/lib/datanucleus-api-jdo-3.2.6.jar:/home/hadoop/conf/:/home/hadoop/conf/
> -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps
> -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70
> -XX:MaxHeapFreeRatio=70 -Xms2048m -Xmx2048m -XX:MaxPermSize=128m
> org.apache.spark.deploy.worker.Worker --webui-port 8081
> spark://ip-10-0-27-185.ec2.internal:7077
> ========================================
> 15/08/27 21:10:25 INFO Worker: Registered signal handlers for [TERM, HUP,
> INT]
> 15/08/27 21:10:26 INFO SecurityManager: Changing view acls to: hadoop
> 15/08/27 21:10:26 INFO SecurityManager: Changing modify acls to: hadoop
> 15/08/27 21:10:26 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(hadoop); users
> with modify permissions: Set(hadoop)
> 15/08/27 21:10:26 INFO Slf4jLogger: Slf4jLogger started
> 15/08/27 21:10:26 INFO Remoting: Starting remoting
> Exception in thread "main" java.util.concurrent.TimeoutException: Futures
> timed out after [10000 milliseconds]
> at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
> at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
> at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)
> at
> scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
> at scala.concurrent.Await$.result(package.scala:107)
> at akka.remote.Remoting.start(Remoting.scala:180)
> at
> akka.remote.RemoteActorRefProvider.init(RemoteActorRefProvider.scala:184)
> at akka.actor.ActorSystemImpl.liftedTree2$1(ActorSystem.scala:618)
> at akka.actor.ActorSystemImpl._start$lzycompute(ActorSystem.scala:615)
> at akka.actor.ActorSystemImpl._start(ActorSystem.scala:615)
> at akka.actor.ActorSystemImpl.start(ActorSystem.scala:632)
> at akka.actor.ActorSystem$.apply(ActorSystem.scala:141)
> at akka.actor.ActorSystem$.apply(ActorSystem.scala:118)
> at
> org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:122)
> at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:54)
> at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
> at
> org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1991)
> at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
> at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1982)
> at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:56)
> at
> org.apache.spark.deploy.worker.Worker$.startSystemAndActor(Worker.scala:553)
> at org.apache.spark.deploy.worker.Worker$.main(Worker.scala:533)
> at org.apache.spark.deploy.worker.Worker.main(Worker.scala)
> 15/08/27 21:10:39 INFO Utils: Shutdown hook called
> Heap
>  par new generation   total 613440K, used 338393K [0x0000000778000000,
> 0x00000007a1990000, 0x00000007a1990000)
>   eden space 545344K,  62% used [0x0000000778000000, 0x000000078ca765b0,
> 0x0000000799490000)
>   from space 68096K,   0% used [0x0000000799490000, 0x0000000799490000,
> 0x000000079d710000)
>   to   space 68096K,   0% used [0x000000079d710000, 0x000000079d710000,
> 0x00000007a1990000)
>  concurrent mark-sweep generation total 1415616K, used 0K
> [0x00000007a1990000, 0x00000007f8000000, 0x00000007f8000000)
>  concurrent-mark-sweep perm gen total 21248K, used 19285K
> [0x00000007f8000000, 0x00000007f94c0000, 0x0000000800000000)
>