You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "George Hawkins (JIRA)" <ji...@apache.org> on 2017/02/18 14:12:44 UTC

[jira] [Created] (SPARK-19657) start-master.sh accidentally forces the use of a loopback address in master URL

George Hawkins created SPARK-19657:
--------------------------------------

             Summary: start-master.sh accidentally forces the use of a loopback address in master URL
                 Key: SPARK-19657
                 URL: https://issues.apache.org/jira/browse/SPARK-19657
             Project: Spark
          Issue Type: Bug
          Components: Deploy
    Affects Versions: 2.1.0
         Environment: Ubuntu 16.04
            Reporter: George Hawkins


{{start-master.sh}} contains the line:

{noformat}
SPARK_MASTER_HOST="`hostname -f`"
{noformat}

{{\-f}} means get the FQDN - the assumption seems to be that this will always return a public IP address (note that if {{start-master.sh}} didn't force the hostname by specifying {{--host}} then the default behavior of {{Master}} is to sensibly default to a public IP).

I came across this when I started a master and it output:

{noformat}
17/02/16 23:03:32 INFO Master: Starting Spark master at spark://myhostname:7077
{noformat}

But my external slaves could not connect to this URL and I was mystified when on the master machine (with just one public IP address) the following both failed:

{noformat}
$ telnet 192.168.1.133 7077
$ telnet 127.0.0.1 7077
{noformat}

{{192.168.1.133}} was the machine's public IP address and {{Master}} seemed to be listening on neither the public IP address nor the loopback address. However the following worked:

{noformat}
$ telnet myhostname 7077
{noformat}

It turns out this is a quirk of Debian and Ubuntu systems - the hostname maps to a loopback address but not to the well known one {{127.0.0.1}}.

If you look in {{/etc/hosts}} you see:

{noformat}
127.0.0.1   localhost
127.0.1.1   myhostname
{noformat}

I looked at this many times before I noticed that it's not the same IP address on both lines (I never knew that the entire {{127.0.0.0/8}} address block is reserved for loopback purposes - see [localhost|https://en.wikipedia.org/wiki/Localhost] on Wikipedia).

Why do Debian and Ubuntu do this? It seems there was a good and explained reason for this way back in time - the {{127.0.1.1}} line used to always map to an FQDN, i.e. you'd expect to see:

{noformat}
127.0.0.1   localhost
127.0.1.1   myhostname.some.domain
{noformat}

The Debian reference manual used to include the following section:

{quote}
Some software (e.g., GNOME) expects the system hostname to be resolvable to an IP address with a canonical fully qualified domain name. This is really improper because system hostnames and domain names are two very different things; but there you have it. In order to support that software, it is necessary to ensure that the system hostname can be resolved.
{quote}

However the [hostname resolution section|https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution] in the current reference, while still mentioning issues with software like GNOME, no longer says that the {{127.0.1.1.}} entry will be an FQDN.

In this [bug report|https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719621] you can see them discussing the change in documentation, i.e. removing the statement that {{127.0.1.1}} always maps to an FQDN, but there's no explanation of the reason for the change (the stated original purpose of this entry in {{/etc/hosts}} seems to be lost by this change, so it seems odd not to explain it).

So while it may be uncommon that a Spark master doesn't have a static IP and an FQDN, in a real cluster setup, this setup is probably quite likely for people getting started with Spark - i.e. starting the master on their personal machine running Ubuntu on a network that uses DHCP. And it's quite confusing to find that {{start-master.sh}} has started the master on an address that isn't externally accessable (and it isn't immediately obvious from the master URL that this is the case).

The simple solution seems to be simply not to specify the {{--host}} argument in {{spark-master.sh}} unless the length of {{$SPARK_MASTER_HOST}} in non-zero. In this case the Spark logic (working in the Java/Scala world where it's far easier to query IP addresses, check if they're loopback addresses etc.) already works out a sensible default public IP address to use.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org