You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Aureliano Buendia <bu...@gmail.com> on 2014/04/19 05:57:48 UTC

Spark-ec2 asks for password

Hi,

Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many
errors like:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
Connection refused
Error 255 while executing remote command, retrying after 30 seconds

.. and recently, it prompts for passwords!:

Warning: Permanently added '' (RSA) to the list of known hosts.
Password:

Note that the hostname in Permanently added '' is missing in the log, which
is probably why it asks for a password.

Is this a known bug?

Re: Spark-ec2 asks for password

Posted by Pierre Borckmans <pi...@realimpactanalytics.com>.
We’ve been experiencing this as well, and our simple solution is to actually keep trying the ssh connection instead of just waiting:

Something like this:


def wait_for_ssh_connection(opts, host):
  u.message("Waiting for ssh connection to host {}".format(host))
  connected = False
  while (connected==False):
    try:
      if (subprocess.check_call(s.ssh_command(opts) + ['-t', '-t', '%s@%s' % (opts.user, host), "ls"])==0):
        connected = True
    except subprocess.CalledProcessError as e:
      print "Ssh connection to host {} failed, retrying in 10 seconds...".format(host)
      time.sleep(10)
  print "Ssh connection to host {} successfully established!".format(host)


HTH

Pierre Borckmans

RealImpact Analytics | Brussels Office
www.realimpactanalytics.com | pierre.borckmans@realimpactanalytics.com

FR +32 485 91 87 31 | Skype pierre.borckmans





On 19 Apr 2014, at 06:51, Patrick Wendell <pw...@gmail.com> wrote:

> Unfortunately - I think a lot of this is due to generally increased latency on ec2 itself. I've noticed that it's way more common than it used to be for instances to come online past the "wait" timeout in the ec2 script.
> 
> 
> On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <fn...@berkeley.edu> wrote:
> Aureliano,
> 
> I've been noticing this error recently as well:
> 
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
> 
> However, this isn't an issue with the spark-ec2 scripts. After the scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts will finish launching and port 22 will open up. Until the EC2 host has launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency seems to be highest in Oregon; I haven't run into this problem on either the California or Virgina EC2 farms. To work around this issue, I've manually modified my copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which seems to work OK. Might be worth a try on your end. I can't comment about the password request; I haven't seen that on my end.
> 
> Regards,
> 
> Frank Austin Nothaft
> fnothaft@berkeley.edu
> fnothaft@eecs.berkeley.edu
> 202-340-0466
> 
> 
> On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <bu...@gmail.com> wrote:
> Hi,
> 
> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many errors like:
> 
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22: Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
> 
> .. and recently, it prompts for passwords!:
> 
> Warning: Permanently added '' (RSA) to the list of known hosts.
> Password:
> 
> Note that the hostname in Permanently added '' is missing in the log, which is probably why it asks for a password.
> 
> Is this a known bug?
> 
> 


Re: Spark-ec2 asks for password

Posted by Patrick Wendell <pw...@gmail.com>.
Unfortunately - I think a lot of this is due to generally increased latency
on ec2 itself. I've noticed that it's way more common than it used to be
for instances to come online past the "wait" timeout in the ec2 script.


On Fri, Apr 18, 2014 at 9:11 PM, FRANK AUSTIN NOTHAFT <fnothaft@berkeley.edu
> wrote:

> Aureliano,
>
> I've been noticing this error recently as well:
>
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
>
> However, this isn't an issue with the spark-ec2 scripts. After the scripts
> fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts
> will finish launching and port 22 will open up. Until the EC2 host has
> launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2
> scripts will fail. I've noticed that EC2 machine launch latency seems to be
> highest in Oregon; I haven't run into this problem on either the California
> or Virgina EC2 farms. To work around this issue, I've manually modified my
> copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which
> seems to work OK. Might be worth a try on your end. I can't comment about
> the password request; I haven't seen that on my end.
>
> Regards,
>
> Frank Austin Nothaft
> fnothaft@berkeley.edu
> fnothaft@eecs.berkeley.edu
> 202-340-0466
>
>
> On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> Hi,
>>
>> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many
>> errors like:
>>
>> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
>> Connection refused
>> Error 255 while executing remote command, retrying after 30 seconds
>>
>> .. and recently, it prompts for passwords!:
>>
>> Warning: Permanently added '' (RSA) to the list of known hosts.
>> Password:
>>
>> Note that the hostname in Permanently added '' is missing in the log,
>> which is probably why it asks for a password.
>>
>> Is this a known bug?
>>
>
>

Re: Spark-ec2 asks for password

Posted by Mayur Rustagi <ma...@gmail.com>.
Hi
We have a deployment tool from GCE that we use internally for Spark. Let me
know if you want access to that. Not really clean enough to opensource
though :).
Regards
Mayur


Mayur Rustagi
Ph: +1 (760) 203 3257
http://www.sigmoidanalytics.com
@mayur_rustagi <https://twitter.com/mayur_rustagi>



On Sat, Apr 19, 2014 at 10:24 AM, Aureliano Buendia <bu...@gmail.com>wrote:

> Frank,
>
> Thanks for the prompt reply. Unfortunately I've been experiencing this for
> the past few weeks on N Virginia farm, note that the latency might also
> depend on the instance type.
>
> I'll try to amend the ec2 script as you suggested, but that will mean
> waiting even longer for the cluster to come up. The current waiting time
> cannot be classified as short (above 15 mins for 50 instances).
>
> I have tried this with and without spot pricing, and there was no
> difference. It seems like amazon is not catching up fast enough with the
> clustering demands.
>
> I wish spark would officially support google compute engine as well,
> specially with the recent price drop, and given that gce is known to start
> up much faster [1].
>
>
> [1]
> http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/
>
>
>
> On Sat, Apr 19, 2014 at 5:11 AM, FRANK AUSTIN NOTHAFT <
> fnothaft@berkeley.edu> wrote:
>
>> Aureliano,
>>
>> I've been noticing this error recently as well:
>>
>> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
>> Connection refused
>> Error 255 while executing remote command, retrying after 30 seconds
>>
>> However, this isn't an issue with the spark-ec2 scripts. After the
>> scripts fail, if you wait a bit longer (e.g., another 2 minutes), the EC2
>> hosts will finish launching and port 22 will open up. Until the EC2 host
>> has launched and opened port 22 for SSH, SSH cannot succeed, and the
>> Spark-ec2 scripts will fail. I've noticed that EC2 machine launch latency
>> seems to be highest in Oregon; I haven't run into this problem on either
>> the California or Virgina EC2 farms. To work around this issue, I've
>> manually modified my copy of the EC2 scripts to wait for 6 failures (i.e.,
>> 3 minutes), which seems to work OK. Might be worth a try on your end. I
>> can't comment about the password request; I haven't seen that on my end.
>>
>> Regards,
>>
>> Frank Austin Nothaft
>> fnothaft@berkeley.edu
>> fnothaft@eecs.berkeley.edu
>> 202-340-0466
>>
>>
>> On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>>
>>> Hi,
>>>
>>> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many
>>> errors like:
>>>
>>> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
>>> Connection refused
>>> Error 255 while executing remote command, retrying after 30 seconds
>>>
>>> .. and recently, it prompts for passwords!:
>>>
>>> Warning: Permanently added '' (RSA) to the list of known hosts.
>>> Password:
>>>
>>> Note that the hostname in Permanently added '' is missing in the log,
>>> which is probably why it asks for a password.
>>>
>>> Is this a known bug?
>>>
>>
>>
>

Re: Spark-ec2 asks for password

Posted by Aureliano Buendia <bu...@gmail.com>.
Frank,

Thanks for the prompt reply. Unfortunately I've been experiencing this for
the past few weeks on N Virginia farm, note that the latency might also
depend on the instance type.

I'll try to amend the ec2 script as you suggested, but that will mean
waiting even longer for the cluster to come up. The current waiting time
cannot be classified as short (above 15 mins for 50 instances).

I have tried this with and without spot pricing, and there was no
difference. It seems like amazon is not catching up fast enough with the
clustering demands.

I wish spark would officially support google compute engine as well,
specially with the recent price drop, and given that gce is known to start
up much faster [1].


[1]
http://gigaom.com/2013/03/15/by-the-numbers-how-google-compute-engine-stacks-up-to-amazon-ec2/



On Sat, Apr 19, 2014 at 5:11 AM, FRANK AUSTIN NOTHAFT <fnothaft@berkeley.edu
> wrote:

> Aureliano,
>
> I've been noticing this error recently as well:
>
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
>
> However, this isn't an issue with the spark-ec2 scripts. After the scripts
> fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts
> will finish launching and port 22 will open up. Until the EC2 host has
> launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2
> scripts will fail. I've noticed that EC2 machine launch latency seems to be
> highest in Oregon; I haven't run into this problem on either the California
> or Virgina EC2 farms. To work around this issue, I've manually modified my
> copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which
> seems to work OK. Might be worth a try on your end. I can't comment about
> the password request; I haven't seen that on my end.
>
> Regards,
>
> Frank Austin Nothaft
> fnothaft@berkeley.edu
> fnothaft@eecs.berkeley.edu
> 202-340-0466
>
>
> On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <bu...@gmail.com>wrote:
>
>> Hi,
>>
>> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many
>> errors like:
>>
>> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
>> Connection refused
>> Error 255 while executing remote command, retrying after 30 seconds
>>
>> .. and recently, it prompts for passwords!:
>>
>> Warning: Permanently added '' (RSA) to the list of known hosts.
>> Password:
>>
>> Note that the hostname in Permanently added '' is missing in the log,
>> which is probably why it asks for a password.
>>
>> Is this a known bug?
>>
>
>

Re: Spark-ec2 asks for password

Posted by FRANK AUSTIN NOTHAFT <fn...@berkeley.edu>.
Aureliano,

I've been noticing this error recently as well:

ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
Connection refused
Error 255 while executing remote command, retrying after 30 seconds

However, this isn't an issue with the spark-ec2 scripts. After the scripts
fail, if you wait a bit longer (e.g., another 2 minutes), the EC2 hosts
will finish launching and port 22 will open up. Until the EC2 host has
launched and opened port 22 for SSH, SSH cannot succeed, and the Spark-ec2
scripts will fail. I've noticed that EC2 machine launch latency seems to be
highest in Oregon; I haven't run into this problem on either the California
or Virgina EC2 farms. To work around this issue, I've manually modified my
copy of the EC2 scripts to wait for 6 failures (i.e., 3 minutes), which
seems to work OK. Might be worth a try on your end. I can't comment about
the password request; I haven't seen that on my end.

Regards,

Frank Austin Nothaft
fnothaft@berkeley.edu
fnothaft@eecs.berkeley.edu
202-340-0466


On Fri, Apr 18, 2014 at 8:57 PM, Aureliano Buendia <bu...@gmail.com>wrote:

> Hi,
>
> Since 0.9.0 spark-ec2 has gone unstable. During launch it throws many
> errors like:
>
> ssh: connect to host ec-xx-xx-xx-xx.compute-1.amazonaws.com port 22:
> Connection refused
> Error 255 while executing remote command, retrying after 30 seconds
>
> .. and recently, it prompts for passwords!:
>
> Warning: Permanently added '' (RSA) to the list of known hosts.
> Password:
>
> Note that the hostname in Permanently added '' is missing in the log,
> which is probably why it asks for a password.
>
> Is this a known bug?
>