You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Izhar ul Hassan <ez...@gmail.com> on 2014/02/01 16:04:54 UTC

java BindException pyspark

Hi,

I am trying to run pyspark with ipython notebook, on a virtual cluster
(openstack) using GRE tunnels and floating IPs.

The VM has only an ethernet with PRIVATE IP ADDRESS and it does not have an
ethernet with Public IP address i.e. the public ip is not set within the
guest.

The PRIVATE IP ADDRESS is mapped on to a FLOATING IP ADDRESS (like amazon
elastic ip) which means that the VM can be accessed from the outside world
with
the Public IP address but the VM itself does not know about the PUBLIC IP
ADDRESS. Therefore java cannot assign this PUBLIC_IP and throws a
BindException.

bin/start-master.sh -i PUBLIC IP ADDRESS -p somePort

I get errors such as:

SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
14/01/31 15:44:02 INFO Slf4jEventHandler: Slf4jEventHandler started
Traceback (most recent call last):
  File "/home/hduser/DataAnalysis/spark/python/pyspark/shell.py", line 32,
in <module>
    sc = SparkContext(os.environ.get("MASTER", "local"), "PySparkShell",
pyFiles=add_files)
  File "/home/hduser/DataAnalysis/spark/python/pyspark/context.py", line
91, in __init__
    empty_string_array)
  File
"/home/hduser/DataAnalysis/spark/python/lib/py4j0.7.egg/py4j/java_gateway.py",
line 632, in __call__
  File
"/home/hduser/DataAnalysis/spark/python/lib/py4j0.7.egg/py4j/protocol.py",
line 300, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling
None.org.apache.spark.api.java.JavaSparkContext.
*: org.jboss.netty.channel.ChannelException: Failed to bind
to:publicResolvableHostname/MY_Public_IP:0*
...
*Caused by: java.net.BindException: Cannot assign requested address*

If I use the local ip address then I can run spark (and pyspark) on the
local interface without any problems. But the ipython notebook still fails
when I try to access it over the internet.
It looks like it tries to connect to spark://PUBLIC_IP:7077 but spark is
running on spark://LOCAL_IP:7077 and hence this breaks.


14/02/01 14:11:00 INFO Client$ClientActor: Connecting to master
spark://PUBLIC_HOST:7077...
14/02/01 14:11:20 INFO Client$ClientActor: Connecting to master
spark://PUBLIC_HOST:7077...
14/02/01 14:11:40 ERROR Client$ClientActor: All masters are unresponsive!
Giving up.
14/02/01 14:11:40 ERROR SparkDeploySchedulerBackend: Spark cluster looks
dead, giving up.
14/02/01 14:11:40 ERROR ClusterScheduler: Exiting due to error from cluster
scheduler: Spark cluster looks down

setting
export SPARK_MASTER_IP=0.0.0.0

in spark-env.sh also does not help.
-- 
/Izhar

P.S. The same settings work on a physical server with public and private
interfaces.