You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Toby Douglass <to...@avocet.io> on 2014/06/12 21:38:19 UTC

spark EC2 bring-up problems

Gents,

I have been bringing up a cluster on EC2 using the spark_ec2.py script.

This works if the cluster has a single slave.

This fails if the cluster has sixteen slaves, during the work to transfer
the SSH key to the slaves.  I cannot currently bring up a large cluster.

Can anyone shed any light on this issue?

As an aside, that script does not work out of the box in Amazon EC2
instances.  Python 2.7 must be installed, and then boto for 2.7.

Re: spark EC2 bring-up problems

Posted by Toby Douglass <to...@avocet.io>.
On Thu, Jun 12, 2014 at 9:10 PM, Zongheng Yang <zo...@gmail.com> wrote:

> Hi Toby,
>
> It is usually the case that even if the EC2 console says the nodes are
> up, they are not really fully initialized. For 16 nodes I have found
> `--wait 800` to be the norm that makes things work.
>

It seems so!  resume worked fine, which fits.

 In my previous experience I have found this to be the culprit, so if

> you immediately do 'launch --resume' when you see the first SSH error
> it's still very likely to fail. But if you wait a little bit longer
> and do 'launch --resume', it could work.
>

It did.  Thankyou :-)

Re: spark EC2 bring-up problems

Posted by Zongheng Yang <zo...@gmail.com>.
Hi Toby,

It is usually the case that even if the EC2 console says the nodes are
up, they are not really fully initialized. For 16 nodes I have found
`--wait 800` to be the norm that makes things work.

In my previous experience I have found this to be the culprit, so if
you immediately do 'launch --resume' when you see the first SSH error
it's still very likely to fail. But if you wait a little bit longer
and do 'launch --resume', it could work.

Zongheng

On Thu, Jun 12, 2014 at 1:03 PM, Toby Douglass <to...@avocet.io> wrote:
> On Thu, Jun 12, 2014 at 8:50 PM, Nicholas Chammas
> <ni...@gmail.com> wrote:
>>
>> Yes, you need Python 2.7 to run spark-ec2 and most AMIs come with 2.6
>
> Ah, yes - I mean to say, Amazon Linux.
>>
>> .Have you tried either:
>>
>> Retrying launch with the --resume option?
>> Increasing the value of the --wait option?
>
> No.  I will try the first, now.  I think the latter is not the issue - the
> instances are up; something else is amiss.
>
> Thankyou.
>

Re: spark EC2 bring-up problems

Posted by Toby Douglass <to...@avocet.io>.
On Thu, Jun 12, 2014 at 8:50 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> Yes, you need Python 2.7 to run spark-ec2 and most AMIs come with 2.6
>
Ah, yes - I mean to say, Amazon Linux.

> .Have you tried either:
>
>    1. Retrying launch with the --resume option?
>    2. Increasing the value of the --wait option?
>
> No.  I will try the first, now.  I think the latter is not the issue - the
instances are up; something else is amiss.

Thankyou.

Re: spark EC2 bring-up problems

Posted by Nicholas Chammas <ni...@gmail.com>.
Yes, you need Python 2.7 to run spark-ec2 and most AMIs come with 2.6.

Have you tried either:

   1. Retrying launch with the --resume option?
   2. Increasing the value of the --wait option?

Nick
​


On Thu, Jun 12, 2014 at 3:38 PM, Toby Douglass <to...@avocet.io> wrote:

> Gents,
>
> I have been bringing up a cluster on EC2 using the spark_ec2.py script.
>
> This works if the cluster has a single slave.
>
> This fails if the cluster has sixteen slaves, during the work to transfer
> the SSH key to the slaves.  I cannot currently bring up a large cluster.
>
> Can anyone shed any light on this issue?
>
> As an aside, that script does not work out of the box in Amazon EC2
> instances.  Python 2.7 must be installed, and then boto for 2.7.
>
>