You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2019/05/21 04:02:14 UTC

[jira] [Updated] (SPARK-19657) start-master.sh accidentally forces the use of a loopback address in master URL

     [ https://issues.apache.org/jira/browse/SPARK-19657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon updated SPARK-19657:
---------------------------------
    Labels: bulk-closed  (was: )

> start-master.sh accidentally forces the use of a loopback address in master URL
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-19657
>                 URL: https://issues.apache.org/jira/browse/SPARK-19657
>             Project: Spark
>          Issue Type: Bug
>          Components: Deploy
>    Affects Versions: 2.1.0
>         Environment: Ubuntu 16.04
>            Reporter: George Hawkins
>            Priority: Major
>              Labels: bulk-closed
>
> {{start-master.sh}} contains the line:
> {noformat}
> SPARK_MASTER_HOST="`hostname -f`"
> {noformat}
> {{\-f}} means get the FQDN - the assumption seems to be that this will always return a public IP address (note that if {{start-master.sh}} didn't force the hostname by specifying {{--host}} then the default behavior of {{Master}} is to sensibly default to a public IP).
> I came across this when I started a master and it output:
> {noformat}
> 17/02/16 23:03:32 INFO Master: Starting Spark master at spark://myhostname:7077
> {noformat}
> But my external slaves could not connect to this URL and I was mystified when on the master machine (with just one public IP address) the following both failed:
> {noformat}
> $ telnet 192.168.1.133 7077
> $ telnet 127.0.0.1 7077
> {noformat}
> {{192.168.1.133}} was the machine's public IP address and {{Master}} seemed to be listening on neither the public IP address nor the loopback address. However the following worked:
> {noformat}
> $ telnet myhostname 7077
> {noformat}
> It turns out this is a quirk of Debian and Ubuntu systems - the hostname maps to a loopback address but not to the well known one {{127.0.0.1}}.
> If you look in {{/etc/hosts}} you see:
> {noformat}
> 127.0.0.1   localhost
> 127.0.1.1   myhostname
> {noformat}
> I looked at this many times before I noticed that it's not the same IP address on both lines (I never knew that the entire {{127.0.0.0/8}} address block is reserved for loopback purposes - see [localhost|https://en.wikipedia.org/wiki/Localhost] on Wikipedia).
> Why do Debian and Ubuntu do this? It seems there was a good and explained reason for this way back in time - the {{127.0.1.1}} line used to always map to an FQDN, i.e. you'd expect to see:
> {noformat}
> 127.0.0.1   localhost
> 127.0.1.1   myhostname.some.domain
> {noformat}
> The Debian reference manual used to include the following section:
> {quote}
> Some software (e.g., GNOME) expects the system hostname to be resolvable to an IP address with a canonical fully qualified domain name. This is really improper because system hostnames and domain names are two very different things; but there you have it. In order to support that software, it is necessary to ensure that the system hostname can be resolved.
> {quote}
> However the [hostname resolution section|https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution] in the current reference, while still mentioning issues with software like GNOME, no longer says that the {{127.0.1.1.}} entry will be an FQDN.
> In this [bug report|https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=719621] you can see them discussing the change in documentation, i.e. removing the statement that {{127.0.1.1}} always maps to an FQDN, but there's no explanation of the reason for the change (the stated original purpose of this entry in {{/etc/hosts}} seems to be lost by this change, so it seems odd not to explain it).
> So while it may be uncommon that a Spark master doesn't have a static IP and an FQDN, in a real cluster setup, this setup is probably quite likely for people getting started with Spark - i.e. starting the master on their personal machine running Ubuntu on a network that uses DHCP. And it's quite confusing to find that {{start-master.sh}} has started the master on an address that isn't externally accessable (and it isn't immediately obvious from the master URL that this is the case).
> The simple solution seems to be simply not to specify the {{--host}} argument in {{spark-master.sh}} unless the length of {{$SPARK_MASTER_HOST}} in non-zero. In this case the Spark logic (working in the Java/Scala world where it's far easier to query IP addresses, check if they're loopback addresses etc.) already works out a sensible default public IP address to use.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org