You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Maxime Lemaire <ma...@wattgo.com> on 2013/11/22 13:43:41 UTC
Worker failed to connect when build with SPARK_HADOOP_VERSION=2.2.0

Hi,
I'm facing a strange issue.
When im building Spark with Hadoop 2.2.0 support, workers cant connect to
the Spark Master anyore.
Network is up and hostnames are correct. Tcpdump can clearly see workers
connecting (tcpdump outputs at the end).

Same set up with Spark build without SPARK_HADOOP_VERSION (or with
SPARK_HADOOP_VERSION=2.0.5-alpha)
is working fine !

Some details :

pmtx-master01 : master
pmtx-master02 : slave

(behavior is the same if i launch both master and slave from the same box)

Building HADOOP 2.2.0 support :

fresh install on pmtx-master01 :
# SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
....build successfull
#

fresh install on pmtx-master02 :
# SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
...build successfull
#

On pmtx-master01 :
# ./bin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to
/cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out
# netstat -an | grep 7077
tcp6       0      0 10.90.XX.XX:7077        :::*                    LISTEN
#

On pmtx-master02 :
# nc -v pmtx-master01 7077
pmtx-master01 [10.90.XX.XX] 7077 (?) open
# ./spark-class org.apache.spark.deploy.worker.Worker
spark://pmtx-master01:7077
13/11/22 10:57:50 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 10:57:50 INFO Worker: Starting Spark worker pmtx-master02:42271
with 8 cores, 22.6 GB RAM
13/11/22 10:57:50 INFO Worker: Spark home: /cluster/bin/spark
13/11/22 10:57:50 INFO WorkerWebUI: Started Worker web UI at
http://pmtx-master02:8081
13/11/22 10:57:50 INFO Worker: Connecting to master
spark://pmtx-master01:7077
13/11/22 10:57:50 ERROR Worker: Connection to master failed! Shutting down.
#

With spark-shell on pmtx-master02 :
# MASTER=spark://pmtx-master01:7077 ./spark-shell
Welcome to
  ____              __
 / __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
  /_/

Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_31)
Initializing interpreter...
Creating SparkContext...
13/11/22 11:19:29 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 11:19:29 INFO SparkEnv: Registering BlockManagerMaster
13/11/22 11:19:29 INFO MemoryStore: MemoryStore started with capacity 323.9
MB.
13/11/22 11:19:29 INFO DiskStore: Created local directory at
/tmp/spark-local-20131122111929-3e3c
13/11/22 11:19:29 INFO ConnectionManager: Bound socket to port 42249 with
id = ConnectionManagerId(pmtx-master02,42249)
13/11/22 11:19:29 INFO BlockManagerMaster: Trying to register BlockManager
13/11/22 11:19:29 INFO BlockManagerMaster: Registered BlockManager
13/11/22 11:19:29 INFO HttpBroadcast: Broadcast server started at
http://10.90.66.67:52531
13/11/22 11:19:29 INFO SparkEnv: Registering MapOutputTracker
13/11/22 11:19:29 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-40525f81-f883-45d5-92ad-bbff44ecf435
13/11/22 11:19:29 INFO SparkUI: Started Spark Web UI at
http://pmtx-master02:4040
13/11/22 11:19:29 INFO Client$ClientActor: Connecting to master
spark://pmtx-master01:7077
13/11/22 11:19:30 ERROR Client$ClientActor: Connection to master failed;
stopping client
13/11/22 11:19:30 ERROR SparkDeploySchedulerBackend: Disconnected from
Spark cluster!
13/11/22 11:19:30 ERROR ClusterScheduler: Exiting due to error from cluster
scheduler: Disconnected from Spark cluster

---- snip ----

WORKING : Building HADOOP 2.0.5-alpha support

On pmtx-master01, now im building hadoop 2.0.5-alpha :
# sbt/sbt clean
...
# SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
...
# ./bin/start-master.sh
starting org.apache.spark.deploy.master.Master, logging to
/cluster/bin/spark-0.8.0-incubating/bin/../logs/spark-root-org.apache.spark.deploy.master.Master-1-pmtx-master01.out

Same build on pmtx-master02 :
# sbt/sbt clean
... build successfull ...
# SPARK_HADOOP_VERSION=2.0.5-alpha sbt/sbt assembly
... build successfull ...
# ./spark-class org.apache.spark.deploy.worker.Worker
spark://pmtx-master01:7077
13/11/22 11:25:34 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 11:25:34 INFO Worker: Starting Spark worker pmtx-master02:33768
with 8 cores, 22.6 GB RAM
13/11/22 11:25:34 INFO Worker: Spark home: /cluster/bin/spark
13/11/22 11:25:34 INFO WorkerWebUI: Started Worker web UI at
http://pmtx-master02:8081
13/11/22 11:25:34 INFO Worker: Connecting to master
spark://pmtx-master01:7077
13/11/22 11:25:34 INFO Worker: Successfully registered with master
#

With spark-shell on pmtx-master02 :
# MASTER=spark://pmtx-master01:7077 ./spark-shell
Welcome to
  ____              __
 / __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/
 /___/ .__/\_,_/_/ /_/\_\   version 0.8.0
  /_/

Using Scala version 2.9.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.6.0_31)
Initializing interpreter...
Creating SparkContext...
13/11/22 11:23:12 INFO Slf4jEventHandler: Slf4jEventHandler started
13/11/22 11:23:12 INFO SparkEnv: Registering BlockManagerMaster
13/11/22 11:23:12 INFO MemoryStore: MemoryStore started with capacity 323.9
MB.
13/11/22 11:23:12 INFO DiskStore: Created local directory at
/tmp/spark-local-20131122112312-3d8b
13/11/22 11:23:12 INFO ConnectionManager: Bound socket to port 58826 with
id = ConnectionManagerId(pmtx-master02,58826)
13/11/22 11:23:12 INFO BlockManagerMaster: Trying to register BlockManager
13/11/22 11:23:12 INFO BlockManagerMaster: Registered BlockManager
13/11/22 11:23:12 INFO HttpBroadcast: Broadcast server started at
http://10.90.66.67:39067
13/11/22 11:23:12 INFO SparkEnv: Registering MapOutputTracker
13/11/22 11:23:12 INFO HttpFileServer: HTTP File server directory is
/tmp/spark-ded7bcc1-bacf-4158-b20f-5b2fa6936e8b
13/11/22 11:23:12 INFO SparkUI: Started Spark Web UI at
http://pmtx-master02:4040
13/11/22 11:23:12 INFO Client$ClientActor: Connecting to master
spark://pmtx-master01:7077
Spark context available as sc.
13/11/22 11:23:12 INFO SparkDeploySchedulerBackend: Connected to Spark
cluster with app ID app-20131122112312-0000
Type in expressions to have them evaluated.
Type :help for more information.
scala>
#

please be aware that I really dont know the Spark communication protocol so
forgive me if i am misunderstanding something. i will make assumptions on
whats happening.
As you can see in tcpdump output, when connection failed, the slave is
sending empty data packets (tcp header only without P flag and length 0)
when it should start the communication by saying "hello iam sparkWorker
pmtx-master02" (4th packet, line 19)

Tcpdump output :
Connection Failed (hadoop 2.2.0) : http://pastebin.com/6N8tEgUf
Connection sucessfull (hadoop 2.0.5-alpha) : http://pastebin.com/CegYAjMj

Also Im not familiar with log4j so if you have some tips to get more log
informations i will try them (im using default properties in
log4j.properties)

Hadoop 2.2.0 is great, Spark 0.8 is awesome, so please, help me make them
work together ! :-)

Thanks

maxx