You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by ZhuGe <tc...@outlook.com> on 2015/03/31 09:12:30 UTC

workers no route to host

Hi,i set up a standalone cluster of 5 machines(tmaster, tslave1,2,3,4) with spark-1.3.0-cdh5.4.0-snapshort. when i execute the sbin/start-all.sh, the master is ok, but i cant see the web ui. Moreover, the worker logs is something like this:
Spark assembly has been built with Hive, including Datanucleus jars on classpath/data/PlatformDep/cdh5/dist/bin/compute-classpath.sh: line 164: hadoop: command not foundSpark Command: java -cp :/data/PlatformDep/cdh5/dist/sbin/../conf:/data/PlatformDep/cdh5/dist/lib/spark-assembly-1.3.0-cdh5.4.0-SNAPSHOT-hadoop2.6.0-cdh5.4.0-SNAPSHOT.jar:/data/PlatformDep/cdh5/dist/lib/datanucleus-rdbms-3.2.1.jar:/data/PlatformDep/cdh5/dist/lib/datanucleus-api-jdo-3.2.1.jar:/data/PlatformDep/cdh5/dist/lib/datanucleus-core-3.2.2.jar: -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m org.apache.spark.deploy.worker.Worker spark://192.168.128.16:7071 --webui-port 8081========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties15/03/31 06:47:22 INFO Worker: Registered signal handlers for [TERM, HUP, INT]15/03/31 06:47:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable15/03/31 06:47:23 INFO SecurityManager: Changing view acls to: dcadmin15/03/31 06:47:23 INFO SecurityManager: Changing modify acls to: dcadmin15/03/31 06:47:23 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(dcadmin); users with modify permissions: Set(dcadmin)15/03/31 06:47:23 INFO Slf4jLogger: Slf4jLogger started15/03/31 06:47:23 INFO Remoting: Starting remoting15/03/31 06:47:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkWorker@tslave2:60815]15/03/31 06:47:24 INFO Utils: Successfully started service 'sparkWorker' on port 60815.15/03/31 06:47:24 INFO Worker: Starting Spark worker tslave2:60815 with 2 cores, 3.0 GB RAM15/03/31 06:47:24 INFO Worker: Running Spark version 1.3.015/03/31 06:47:24 INFO Worker: Spark home: /data/PlatformDep/cdh5/dist15/03/31 06:47:24 INFO Server: jetty-8.y.z-SNAPSHOT15/03/31 06:47:24 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:808115/03/31 06:47:24 INFO Utils: Successfully started service 'WorkerUI' on port 8081.15/03/31 06:47:24 INFO WorkerWebUI: Started WorkerWebUI at http://tslave2:808115/03/31 06:47:24 INFO Worker: Connecting to master akka.tcp://sparkMaster@192.168.128.16:7071/user/Master...15/03/31 06:47:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://sparkMaster@192.168.128.16:7071]: Error [Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No route to host]15/03/31 06:47:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://sparkMaster@192.168.128.16:7071]: Error [Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No route to host]15/03/31 06:47:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://sparkMaster@192.168.128.16:7071]: Error [Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]Caused by: akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No route to host]15/03/31 06:47:24 ERROR EndpointWriter: AssociationError [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://sparkMaster@192.168.128.16:7071]: Error [Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]] [akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster@192.168.128.16:7071]


the worker machines ping the master machine successfully. the hosts is like this:192.168.128.16 tmaster tmaster192.168.128.17 tslave1 tslave1192.168.128.18 tslave2 tslave2192.168.128.19 tslave3 tslave3192.168.128.20 tslave4 tslave4
Hope someone could help. Thanks 		 	   		  

Re: workers no route to host

Posted by Dean Wampler <de...@gmail.com>.
It appears you are using a Cloudera Spark build, 1.3.0-cdh5.4.0-SNAPSHOT,
which expects to find the hadoop command:

/data/PlatformDep/cdh5/dist/bin/compute-classpath.sh: line 164: hadoop:
command not found

If you don't want to use Hadoop, download one of the pre-built Spark
releases from spark.apache.org. Even the Hadoop builds there will work
okay, as they don't actually attempt to run Hadoop commands.


Dean Wampler, Ph.D.
Author: Programming Scala, 2nd Edition
<http://shop.oreilly.com/product/0636920033073.do> (O'Reilly)
Typesafe <http://typesafe.com>
@deanwampler <http://twitter.com/deanwampler>
http://polyglotprogramming.com

On Tue, Mar 31, 2015 at 3:12 AM, ZhuGe <tc...@outlook.com> wrote:

> Hi,
> i set up a standalone cluster of 5 machines(tmaster, tslave1,2,3,4) with
> spark-1.3.0-cdh5.4.0-snapshort.
> when i execute the sbin/start-all.sh, the master is ok, but i cant see the
> web ui. Moreover, the worker logs is something like this:
>
> Spark assembly has been built with Hive, including Datanucleus jars on
> classpath
> /data/PlatformDep/cdh5/dist/bin/compute-classpath.sh: line 164: hadoop:
> command not found
> Spark Command: java -cp
> :/data/PlatformDep/cdh5/dist/sbin/../conf:/data/PlatformDep/cdh5/dist/lib/spark-assembly-1.3.0-cdh5.4.0-SNAPSHOT-hadoop2.6.0-cdh5.4.0-SNAPSHOT.jar:/data/PlatformDep/cdh5/dist/lib/datanucleus-rdbms-3.2.1.jar:/data/PlatformDep/cdh5/dist/lib/datanucleus-api-jdo-3.2.1.jar:/data/PlatformDep/cdh5/dist/lib/datanucleus-core-3.2.2.jar:
> -XX:MaxPermSize=128m -Dspark.akka.logLifecycleEvents=true -Xms512m -Xmx512m
> org.apache.spark.deploy.worker.Worker spark://192.168.128.16:7071
> --webui-port 8081
> ========================================
>
> Using Spark's default log4j profile:
> org/apache/spark/log4j-defaults.properties
> 15/03/31 06:47:22 INFO Worker: Registered signal handlers for [TERM, HUP,
> INT]
> 15/03/31 06:47:23 WARN NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> 15/03/31 06:47:23 INFO SecurityManager: Changing view acls to: dcadmin
> 15/03/31 06:47:23 INFO SecurityManager: Changing modify acls to: dcadmin
> 15/03/31 06:47:23 INFO SecurityManager: SecurityManager: authentication
> disabled; ui acls disabled; users with view permissions: Set(dcadmin);
> users with modify permissions: Set(dcadmin)
> 15/03/31 06:47:23 INFO Slf4jLogger: Slf4jLogger started
> 15/03/31 06:47:23 INFO Remoting: Starting remoting
> 15/03/31 06:47:23 INFO Remoting: Remoting started; listening on addresses
> :[akka.tcp://sparkWorker@tslave2:60815]
> 15/03/31 06:47:24 INFO Utils: Successfully started service 'sparkWorker'
> on port 60815.
> 15/03/31 06:47:24 INFO Worker: Starting Spark worker tslave2:60815 with 2
> cores, 3.0 GB RAM
> 15/03/31 06:47:24 INFO Worker: Running Spark version 1.3.0
> 15/03/31 06:47:24 INFO Worker: Spark home: /data/PlatformDep/cdh5/dist
> 15/03/31 06:47:24 INFO Server: jetty-8.y.z-SNAPSHOT
> 15/03/31 06:47:24 INFO AbstractConnector: Started
> SelectChannelConnector@0.0.0.0:8081
> 15/03/31 06:47:24 INFO Utils: Successfully started service 'WorkerUI' on
> port 8081.
> 15/03/31 06:47:24 INFO WorkerWebUI: Started WorkerWebUI at
> http://tslave2:8081
> 15/03/31 06:47:24 INFO Worker: Connecting to master akka.tcp://
> sparkMaster@192.168.128.16:7071/user/Master...
> 15/03/31 06:47:24 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://
> sparkMaster@192.168.128.16:7071]: Error [Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No
> route to host
> ]
> 15/03/31 06:47:24 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://
> sparkMaster@192.168.128.16:7071]: Error [Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No
> route to host
> ]
> 15/03/31 06:47:24 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://
> sparkMaster@192.168.128.16:7071]: Error [Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]
> Caused by:
> akka.remote.transport.netty.NettyTransport$$anonfun$associate$1$$anon$2: No
> route to host
> ]
> 15/03/31 06:47:24 ERROR EndpointWriter: AssociationError
> [akka.tcp://sparkWorker@tslave2:60815] -> [akka.tcp://
> sparkMaster@192.168.128.16:7071]: Error [Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]] [
> akka.remote.EndpointAssociationException: Association failed with
> [akka.tcp://sparkMaster@192.168.128.16:7071]
>
>
>
> the worker machines ping the master machine successfully.
> the hosts is like this:
> 192.168.128.16 tmaster tmaster
> 192.168.128.17 tslave1 tslave1
> 192.168.128.18 tslave2 tslave2
> 192.168.128.19 tslave3 tslave3
> 192.168.128.20 tslave4 tslave4
>
> Hope someone could help.
> Thanks
>