You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Mark Stetzer <st...@gmail.com> on 2009/10/19 17:41:10 UTC

Terminate Instances Terminating ALL EC2 Instances

Hey all,

While running the (latest as of Friday) Cloudera-created EC2 scripts,
I noticed that running the terminate-cluster script kills ALL of your
EC2 nodes, not just those associated with the cluster.  This has been
documented before in HADOOP-1504
(http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
integrated way back on June 21, 2007.  My questions are:

1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
AND
2)  Is this a regression in the common code, a problem with the
Cloudera scripts, or just user error on my part?

Just trying to get to the bottom of this so no one else has to see all
of their EC2 instances die accidentally :(

Thanks!

-Mark

Re: Terminate Instances Terminating ALL EC2 Instances

Posted by Tom White <to...@cloudera.com>.
On Mon, Oct 19, 2009 at 5:34 PM, Mark Stetzer <st...@gmail.com> wrote:
> Hi Tom,
>
> The terminate-cluster script only lists the instances that are part of
> the cluster (master and all slaves) as far as I can tell.  As an
> example, I set up a cluster of 1 master and 5 slaves, then started an
> additional non-Hadoop server via the AWS mgmt. console running a
> completely different AMI (OpenSolaris 2009.06 just to be very
> different).  terminate-cluster only listed the 6 instances that were
> part of the cluster if I remember correctly.
>
> I have 4 security groups:  default, default-master, default-slave, and
> mark-default.  mark-default wasn't even added until after I started
> the Hadoop cluster; I added it to log in to the OpenSolaris instance.

I think there is a bug here. I've filed
https://issues.apache.org/jira/browse/HADOOP-6320. As an immediate
workaround you can avoid calling the Hadoop cluster "default", and
make sure that you don't create non-Hadoop EC2 instances in the
cluster group.

Thanks,
Tom

>
> Does this help at all?  Thanks.
>
> -Mark
>
> On Mon, Oct 19, 2009 at 11:52 AM, Tom White <to...@cloudera.com> wrote:
>> Hi Mark,
>>
>> Sorry to hear that all your EC2 instances were terminated. Needless to
>> say, this should certainly not happen.
>>
>> The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
>> HADOOP-1504 is not applicable, but the behaviour should be the same:
>> the terminate-cluster command lists the instances that it will
>> terminate, and prompts for confirmation that they should be
>> terminated. Is it listing instances that are not in the cluster? I
>> have used this script a lot and it has never terminated any instances
>> that are not in the cluster.
>>
>> What are the names of the security groups that the instances are in
>> (both those in the cluster, and those outside the cluster that are
>> inadvertently terminated)?
>>
>> Thanks,
>> Tom
>>
>> On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer <st...@gmail.com> wrote:
>>> Hey all,
>>>
>>> While running the (latest as of Friday) Cloudera-created EC2 scripts,
>>> I noticed that running the terminate-cluster script kills ALL of your
>>> EC2 nodes, not just those associated with the cluster.  This has been
>>> documented before in HADOOP-1504
>>> (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
>>> integrated way back on June 21, 2007.  My questions are:
>>>
>>> 1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
>>> AND
>>> 2)  Is this a regression in the common code, a problem with the
>>> Cloudera scripts, or just user error on my part?
>>>
>>> Just trying to get to the bottom of this so no one else has to see all
>>> of their EC2 instances die accidentally :(
>>>
>>> Thanks!
>>>
>>> -Mark
>>>
>>
>

Re: Terminate Instances Terminating ALL EC2 Instances

Posted by Mark Stetzer <st...@gmail.com>.
Hi Tom,

The terminate-cluster script only lists the instances that are part of
the cluster (master and all slaves) as far as I can tell.  As an
example, I set up a cluster of 1 master and 5 slaves, then started an
additional non-Hadoop server via the AWS mgmt. console running a
completely different AMI (OpenSolaris 2009.06 just to be very
different).  terminate-cluster only listed the 6 instances that were
part of the cluster if I remember correctly.

I have 4 security groups:  default, default-master, default-slave, and
mark-default.  mark-default wasn't even added until after I started
the Hadoop cluster; I added it to log in to the OpenSolaris instance.

Does this help at all?  Thanks.

-Mark

On Mon, Oct 19, 2009 at 11:52 AM, Tom White <to...@cloudera.com> wrote:
> Hi Mark,
>
> Sorry to hear that all your EC2 instances were terminated. Needless to
> say, this should certainly not happen.
>
> The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
> HADOOP-1504 is not applicable, but the behaviour should be the same:
> the terminate-cluster command lists the instances that it will
> terminate, and prompts for confirmation that they should be
> terminated. Is it listing instances that are not in the cluster? I
> have used this script a lot and it has never terminated any instances
> that are not in the cluster.
>
> What are the names of the security groups that the instances are in
> (both those in the cluster, and those outside the cluster that are
> inadvertently terminated)?
>
> Thanks,
> Tom
>
> On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer <st...@gmail.com> wrote:
>> Hey all,
>>
>> While running the (latest as of Friday) Cloudera-created EC2 scripts,
>> I noticed that running the terminate-cluster script kills ALL of your
>> EC2 nodes, not just those associated with the cluster.  This has been
>> documented before in HADOOP-1504
>> (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
>> integrated way back on June 21, 2007.  My questions are:
>>
>> 1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
>> AND
>> 2)  Is this a regression in the common code, a problem with the
>> Cloudera scripts, or just user error on my part?
>>
>> Just trying to get to the bottom of this so no one else has to see all
>> of their EC2 instances die accidentally :(
>>
>> Thanks!
>>
>> -Mark
>>
>

Re: Terminate Instances Terminating ALL EC2 Instances

Posted by Tom White <to...@cloudera.com>.
Hi Mark,

Sorry to hear that all your EC2 instances were terminated. Needless to
say, this should certainly not happen.

The scripts are a Python rewrite (see HADOOP-6108) of the bash ones so
HADOOP-1504 is not applicable, but the behaviour should be the same:
the terminate-cluster command lists the instances that it will
terminate, and prompts for confirmation that they should be
terminated. Is it listing instances that are not in the cluster? I
have used this script a lot and it has never terminated any instances
that are not in the cluster.

What are the names of the security groups that the instances are in
(both those in the cluster, and those outside the cluster that are
inadvertently terminated)?

Thanks,
Tom

On Mon, Oct 19, 2009 at 4:41 PM, Mark Stetzer <st...@gmail.com> wrote:
> Hey all,
>
> While running the (latest as of Friday) Cloudera-created EC2 scripts,
> I noticed that running the terminate-cluster script kills ALL of your
> EC2 nodes, not just those associated with the cluster.  This has been
> documented before in HADOOP-1504
> (http://issues.apache.org/jira/browse/HADOOP-1504), and a fix was
> integrated way back on June 21, 2007.  My questions are:
>
> 1)  Is anyone else seeing this?  I can reproduce this behavior consistently.
> AND
> 2)  Is this a regression in the common code, a problem with the
> Cloudera scripts, or just user error on my part?
>
> Just trying to get to the bottom of this so no one else has to see all
> of their EC2 instances die accidentally :(
>
> Thanks!
>
> -Mark
>