You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by java8964 <ja...@hotmail.com> on 2014/10/27 16:38:32 UTC

Problem to run spark as standalone

Hi, Spark Users:
I tried to test the spark in a standalone box, but faced an issue which I don't know what is the root cause. I basically followed exactly document of deploy spark in a standalone environment.
1) I check out spark source code of release 1.1.02) I build the spark with following command: ./make-distribution.sh -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests, Succeeded.3) I make sure that I can ssh to the localhost as myself using ssh key.4) I run the sbin/start-all.sh, it looks fine, at least I saw 2 java processes running.5) When I run the following command: yzhang@yzhang-linux:/opt/spark-1.1.0-bin-hadoop2.4.0/bin$ ./spark-shell --master spark://yzhang-linux:7077
I saw the following message, then the shell exits itself.
14/10/27 11:22:53 INFO repl.SparkILoop: Created spark context..Spark context available as sc.
scala> 14/10/27 11:23:13 INFO client.AppClient$ClientActor: Connecting to master spark://yzhang-linux:7077...14/10/27 11:23:33 INFO client.AppClient$ClientActor: Connecting to master spark://yzhang-linux:7077...14/10/27 11:23:53 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.14/10/27 11:23:53 ERROR scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
Now, I check the log file, and found out the following message in the master log:
 14/10/27 11:22:53 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are [akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:13 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are [akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:33 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are [akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 INFO actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.240.8%3A63348-2#1992401281] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@yzhang-linux:7077] -> [akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@yzhang-linux:7077] -> [akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@yzhang-linux:7077] -> [akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.
Any reason why this is happening? The web UI of spark looks normal. There is no error message in the worker log. This is a standalone box, no firewall. The hostname and IP can be resolved by itself without any problem.
Thanks for your help.
Yong 		 	   		  

RE: Problem to run spark as standalone

Posted by java8964 <ja...@hotmail.com>.
I did a little more research about this. It looks like the worker started successfully, but on port 40294. This is shown in both log and master web UI.
The question is that in the log, the master akka.tcp is trying to connect to another different port (44017). Why?
Yong

From: java8964@hotmail.com
To: user@spark.incubator.apache.org
Subject: Problem to run spark as standalone
Date: Mon, 27 Oct 2014 11:38:32 -0400




Hi, Spark Users:
I tried to test the spark in a standalone box, but faced an issue which I don't know what is the root cause. I basically followed exactly document of deploy spark in a standalone environment.
1) I check out spark source code of release 1.1.02) I build the spark with following command: ./make-distribution.sh -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests, Succeeded.3) I make sure that I can ssh to the localhost as myself using ssh key.4) I run the sbin/start-all.sh, it looks fine, at least I saw 2 java processes running.5) When I run the following command: yzhang@yzhang-linux:/opt/spark-1.1.0-bin-hadoop2.4.0/bin$ ./spark-shell --master spark://yzhang-linux:7077
I saw the following message, then the shell exits itself.
14/10/27 11:22:53 INFO repl.SparkILoop: Created spark context..Spark context available as sc.
scala> 14/10/27 11:23:13 INFO client.AppClient$ClientActor: Connecting to master spark://yzhang-linux:7077...14/10/27 11:23:33 INFO client.AppClient$ClientActor: Connecting to master spark://yzhang-linux:7077...14/10/27 11:23:53 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.14/10/27 11:23:53 ERROR scheduler.TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.
Now, I check the log file, and found out the following message in the master log:
 14/10/27 11:22:53 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are [akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:13 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are [akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:33 ERROR remote.EndpointWriter: dropping message [class akka.actor.SelectChildName] for non-local recipient [Actor[akka.tcp://sparkMaster@yzhang-linux:7077/]] arriving at [akka.tcp://sparkMaster@yzhang-linux:7077] inbound addresses are [akka.tcp://sparkMaster@yzhang-linux:7077]14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 INFO actor.LocalActorRef: Message [akka.remote.transport.ActorTransportAdapter$DisassociateUnderlying] from Actor[akka://sparkMaster/deadLetters] to Actor[akka://sparkMaster/system/transports/akkaprotocolmanager.tcp0/akkaProtocol-tcp%3A%2F%2FsparkMaster%40192.168.240.8%3A63348-2#1992401281] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@yzhang-linux:7077] -> [akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@yzhang-linux:7077] -> [akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 ERROR remote.EndpointWriter: AssociationError [akka.tcp://sparkMaster@yzhang-linux:7077] -> [akka.tcp://sparkDriver@yzhang-linux:44017]: Error [Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkDriver@yzhang-linux:44017]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: Connection refused: yzhang-linux/192.168.240.8:44017]14/10/27 11:23:53 INFO master.Master: akka.tcp://sparkDriver@yzhang-linux:44017 got disassociated, removing it.
Any reason why this is happening? The web UI of spark looks normal. There is no error message in the worker log. This is a standalone box, no firewall. The hostname and IP can be resolved by itself without any problem.
Thanks for your help.
Yong