You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aaron Babcock <aa...@gmail.com> on 2013/10/05 23:45:31 UTC

spark through vpn, SPARK_LOCAL_IP

Hello,

I am using spark through a vpn. My driver machine ends up with two ip
addresses, one routable from the cluster and one not.

Things generally work when I set the SPARK_LOCAL_IP environment
variable to the proper ip address.

However, when I try to use the take function ie: myRdd.take(1),  I run
into a hiccup. From the logfiles on the workers I can see that they
trying to connect to the nonroutable ip address, they are not
respecting SPARK_LOCAL_IP somehow.

Here is the relevant worker log snippet, 192.168.250.47 is the correct
routable ip address of the driver, 192.168.0.7 is the incorrect
address of the driver. Any thoughts about what else I need to
configure?

13/10/05 16:17:36 INFO ConnectionManager: Accepted connection from
[192.168.250.47/192.168.250.47]
13/10/05 16:18:41 WARN SendingConnection: Error finishing connection
to /192.168.0.7:60513
java.net.ConnectException: Connection timed out
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
at spark.network.SendingConnection.finishConnect(Connection.scala:221)
at spark.network.ConnectionManager.spark$network$ConnectionManager$$run(ConnectionManager.scala:127)
at spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:70)
13/10/05 16:18:41 INFO ConnectionManager: Handling connection error on
connection to ConnectionManagerId(192.168.0.7,60513)
13/10/05 16:18:41 INFO ConnectionManager: Removing SendingConnection
to ConnectionManagerId(192.168.0.7,60513)

HCatalog and spark

Posted by Chester <ch...@yahoo.com>.
Any one has experience with spark accessing file via Hcatalog ?

Thanks

Chester

Re: spark through vpn, SPARK_LOCAL_IP

Posted by viren kumar <vi...@gmail.com>.
Is that really the only solution? I too am faced with the same problem of
running the driver on a machine with two IPs, one internal and one
external. I launch the job and the Spark server fails to connect to the
client since it tries on the internal IP. I tried setting SPARK_LOCAL_IP,
but to no avail. I can't change the hostname of the machine either. Any
thoughts, anyone?

Viren


On Tue, Oct 8, 2013 at 2:32 PM, Aaron Babcock <aa...@gmail.com>wrote:

> Replying to document my fix:
>
> I was able to trick spark into working by setting my hostname to my
> preferred ip address.
>
> ie
> $ sudo hostname 192.168.250.47
>
> Not sure if this is a good idea in general, but it worked well enough
> for me to develop with my macbook driving the cluster through the vpn.
>
>
> On Sat, Oct 5, 2013 at 7:15 PM, Aaron Babcock <aa...@gmail.com>
> wrote:
> > hmm, that did not seem to do it.
> >
> > Interestingly the problem only appears with
> > rdd.take(1)
> >
> > rdd.collect() works just fine
> >
> > On Sat, Oct 5, 2013 at 4:49 PM, Aaron Davidson <il...@gmail.com>
> wrote:
> >> You might try also setting spark.driver.host to the correct IP in the
> >> conf/spark-env.sh SPARK_JAVA_OPTs as well.
> >>
> >> e.g.,
> >> -Dspark.driver.host=192.168.250.47
> >>
> >>
> >>
> >> On Sat, Oct 5, 2013 at 2:45 PM, Aaron Babcock <aa...@gmail.com>
> >> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I am using spark through a vpn. My driver machine ends up with two ip
> >>> addresses, one routable from the cluster and one not.
> >>>
> >>> Things generally work when I set the SPARK_LOCAL_IP environment
> >>> variable to the proper ip address.
> >>>
> >>> However, when I try to use the take function ie: myRdd.take(1),  I run
> >>> into a hiccup. From the logfiles on the workers I can see that they
> >>> trying to connect to the nonroutable ip address, they are not
> >>> respecting SPARK_LOCAL_IP somehow.
> >>>
> >>> Here is the relevant worker log snippet, 192.168.250.47 is the correct
> >>> routable ip address of the driver, 192.168.0.7 is the incorrect
> >>> address of the driver. Any thoughts about what else I need to
> >>> configure?
> >>>
> >>> 13/10/05 16:17:36 INFO ConnectionManager: Accepted connection from
> >>> [192.168.250.47/192.168.250.47]
> >>> 13/10/05 16:18:41 WARN SendingConnection: Error finishing connection
> >>> to /192.168.0.7:60513
> >>> java.net.ConnectException: Connection timed out
> >>> at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
> >>> at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
> >>> at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> >>> at spark.network.SendingConnection.finishConnect(Connection.scala:221)
> >>> at
> >>>
> spark.network.ConnectionManager.spark$network$ConnectionManager$$run(ConnectionManager.scala:127)
> >>> at
> spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:70)
> >>> 13/10/05 16:18:41 INFO ConnectionManager: Handling connection error on
> >>> connection to ConnectionManagerId(192.168.0.7,60513)
> >>> 13/10/05 16:18:41 INFO ConnectionManager: Removing SendingConnection
> >>> to ConnectionManagerId(192.168.0.7,60513)
> >>
> >>
>

Re: spark through vpn, SPARK_LOCAL_IP

Posted by Aaron Babcock <aa...@gmail.com>.
Replying to document my fix:

I was able to trick spark into working by setting my hostname to my
preferred ip address.

ie
$ sudo hostname 192.168.250.47

Not sure if this is a good idea in general, but it worked well enough
for me to develop with my macbook driving the cluster through the vpn.


On Sat, Oct 5, 2013 at 7:15 PM, Aaron Babcock <aa...@gmail.com> wrote:
> hmm, that did not seem to do it.
>
> Interestingly the problem only appears with
> rdd.take(1)
>
> rdd.collect() works just fine
>
> On Sat, Oct 5, 2013 at 4:49 PM, Aaron Davidson <il...@gmail.com> wrote:
>> You might try also setting spark.driver.host to the correct IP in the
>> conf/spark-env.sh SPARK_JAVA_OPTs as well.
>>
>> e.g.,
>> -Dspark.driver.host=192.168.250.47
>>
>>
>>
>> On Sat, Oct 5, 2013 at 2:45 PM, Aaron Babcock <aa...@gmail.com>
>> wrote:
>>>
>>> Hello,
>>>
>>> I am using spark through a vpn. My driver machine ends up with two ip
>>> addresses, one routable from the cluster and one not.
>>>
>>> Things generally work when I set the SPARK_LOCAL_IP environment
>>> variable to the proper ip address.
>>>
>>> However, when I try to use the take function ie: myRdd.take(1),  I run
>>> into a hiccup. From the logfiles on the workers I can see that they
>>> trying to connect to the nonroutable ip address, they are not
>>> respecting SPARK_LOCAL_IP somehow.
>>>
>>> Here is the relevant worker log snippet, 192.168.250.47 is the correct
>>> routable ip address of the driver, 192.168.0.7 is the incorrect
>>> address of the driver. Any thoughts about what else I need to
>>> configure?
>>>
>>> 13/10/05 16:17:36 INFO ConnectionManager: Accepted connection from
>>> [192.168.250.47/192.168.250.47]
>>> 13/10/05 16:18:41 WARN SendingConnection: Error finishing connection
>>> to /192.168.0.7:60513
>>> java.net.ConnectException: Connection timed out
>>> at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>>> at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>>> at spark.network.SendingConnection.finishConnect(Connection.scala:221)
>>> at
>>> spark.network.ConnectionManager.spark$network$ConnectionManager$$run(ConnectionManager.scala:127)
>>> at spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:70)
>>> 13/10/05 16:18:41 INFO ConnectionManager: Handling connection error on
>>> connection to ConnectionManagerId(192.168.0.7,60513)
>>> 13/10/05 16:18:41 INFO ConnectionManager: Removing SendingConnection
>>> to ConnectionManagerId(192.168.0.7,60513)
>>
>>

Re: spark through vpn, SPARK_LOCAL_IP

Posted by Aaron Babcock <aa...@gmail.com>.
hmm, that did not seem to do it.

Interestingly the problem only appears with
rdd.take(1)

rdd.collect() works just fine

On Sat, Oct 5, 2013 at 4:49 PM, Aaron Davidson <il...@gmail.com> wrote:
> You might try also setting spark.driver.host to the correct IP in the
> conf/spark-env.sh SPARK_JAVA_OPTs as well.
>
> e.g.,
> -Dspark.driver.host=192.168.250.47
>
>
>
> On Sat, Oct 5, 2013 at 2:45 PM, Aaron Babcock <aa...@gmail.com>
> wrote:
>>
>> Hello,
>>
>> I am using spark through a vpn. My driver machine ends up with two ip
>> addresses, one routable from the cluster and one not.
>>
>> Things generally work when I set the SPARK_LOCAL_IP environment
>> variable to the proper ip address.
>>
>> However, when I try to use the take function ie: myRdd.take(1),  I run
>> into a hiccup. From the logfiles on the workers I can see that they
>> trying to connect to the nonroutable ip address, they are not
>> respecting SPARK_LOCAL_IP somehow.
>>
>> Here is the relevant worker log snippet, 192.168.250.47 is the correct
>> routable ip address of the driver, 192.168.0.7 is the incorrect
>> address of the driver. Any thoughts about what else I need to
>> configure?
>>
>> 13/10/05 16:17:36 INFO ConnectionManager: Accepted connection from
>> [192.168.250.47/192.168.250.47]
>> 13/10/05 16:18:41 WARN SendingConnection: Error finishing connection
>> to /192.168.0.7:60513
>> java.net.ConnectException: Connection timed out
>> at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
>> at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
>> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>> at spark.network.SendingConnection.finishConnect(Connection.scala:221)
>> at
>> spark.network.ConnectionManager.spark$network$ConnectionManager$$run(ConnectionManager.scala:127)
>> at spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:70)
>> 13/10/05 16:18:41 INFO ConnectionManager: Handling connection error on
>> connection to ConnectionManagerId(192.168.0.7,60513)
>> 13/10/05 16:18:41 INFO ConnectionManager: Removing SendingConnection
>> to ConnectionManagerId(192.168.0.7,60513)
>
>

Re: spark through vpn, SPARK_LOCAL_IP

Posted by Aaron Davidson <il...@gmail.com>.
You might try also setting spark.driver.host to the correct IP in the
conf/spark-env.sh SPARK_JAVA_OPTs as well.

e.g.,
-Dspark.driver.host=192.168.250.47



On Sat, Oct 5, 2013 at 2:45 PM, Aaron Babcock <aa...@gmail.com>wrote:

> Hello,
>
> I am using spark through a vpn. My driver machine ends up with two ip
> addresses, one routable from the cluster and one not.
>
> Things generally work when I set the SPARK_LOCAL_IP environment
> variable to the proper ip address.
>
> However, when I try to use the take function ie: myRdd.take(1),  I run
> into a hiccup. From the logfiles on the workers I can see that they
> trying to connect to the nonroutable ip address, they are not
> respecting SPARK_LOCAL_IP somehow.
>
> Here is the relevant worker log snippet, 192.168.250.47 is the correct
> routable ip address of the driver, 192.168.0.7 is the incorrect
> address of the driver. Any thoughts about what else I need to
> configure?
>
> 13/10/05 16:17:36 INFO ConnectionManager: Accepted connection from
> [192.168.250.47/192.168.250.47]
> 13/10/05 16:18:41 WARN SendingConnection: Error finishing connection
> to /192.168.0.7:60513
> java.net.ConnectException: Connection timed out
> at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at spark.network.SendingConnection.finishConnect(Connection.scala:221)
> at
> spark.network.ConnectionManager.spark$network$ConnectionManager$$run(ConnectionManager.scala:127)
> at spark.network.ConnectionManager$$anon$4.run(ConnectionManager.scala:70)
> 13/10/05 16:18:41 INFO ConnectionManager: Handling connection error on
> connection to ConnectionManagerId(192.168.0.7,60513)
> 13/10/05 16:18:41 INFO ConnectionManager: Removing SendingConnection
> to ConnectionManagerId(192.168.0.7,60513)
>