You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by prasenjit <pr...@gmail.com> on 2010/01/18 05:58:16 UTC

which hadoop-ec2 is preferred ( cloudera/hadoop ? )

It seems there are 2 hadoop-ec2 scripts:

1) One which comes along with the hadoop distro :
<hadoop>/src/contrib/ec2/bin/hadoop-ec2

2) Another which is downloadable from
http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
and is from cloudera folks. 

I prefer using the base hadoop, as I want to avoid  dependencies on
boto/simplejson which is required for (2).  My question is are they planned
to kept in sync. Which one is under active development and hence suggested 
( given my preference for hadoop's contrib package )  for stable use ? 

-Thanks,
Prasen

-- 
View this message in context: http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.


Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by prasenjit mukherjee <pr...@gmail.com>.
"echo $AWS_SECRET_ACCESS_KEY"   returns a valid string ( as already
mentioned in my mail )

Any other settings I need to do for boto to pick it up ?

-Prasen

On Mon, Jan 18, 2010 at 11:53 AM, Chandraprakash Bhagtani <
cpbhagtani@gmail.com> wrote:
> you need to set following environment variables
>
>
>   - AWS_ACCESS_KEY_ID - Your AWS Access Key ID
>   - AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key
>
>
> On Mon, Jan 18, 2010 at 11:12 AM, prasenjit mukherjee
> <pr...@gmail.com>wrote:
>
>> Thanks for the suggestion.  Now I am getting the following error with
>> cloudera's distro.* I have set AWS_SECRET_KEY appropriately though. Any
>> pointers :
>>
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> echo $AWS_SECRET_ACCESS_KEY
>> <.........snipped......................................>*
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> ./hadoop-ec2 list
>> Traceback (most recent call last):
>>  File "./hadoop-ec2", line 124, in <module>
>>    list_all()
>>  File
>>
"/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
>> line 43, in list_all
>>    clusters = get_clusters_with_role(MASTER)
>>  File
>>
"/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
>> line 29, in get_clusters_with_role
>>    all = EC2Connection().get_all_instances()
>>  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
>> 69, in __init__
>>    self.region.endpoint, debug, https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 446, in __init__
>>    debug,  https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 169, in __init__
>>    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
>> AttributeError: EC2Connection instance has no attribute
>> 'aws_secret_access_key'
>>
>> -Prasen
>> On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
>> > In my experience, the Cloudera distributions are excellent, actively
>> > developed, and well-supported.
>> >
>> > Zak
>> >
>> >
>> > On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com>
>> wrote:
>> >> My personal experience led me to prefer cloudera. Can't talk for every
>> >> situation, but for me the hadoop distro had many bugs and was
>> unreliable.
>> >>
>> >> Mark
>> >>
>> >> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>> It seems there are 2 hadoop-ec2 scripts:
>> >>>
>> >>> 1) One which comes along with the hadoop distro :
>> >>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>> >>>
>> >>> 2) Another which is downloadable from
>> >>>
>> >>>
>>
http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>> >>> and is from cloudera folks.
>> >>>
>> >>> I prefer using the base hadoop, as I want to avoid  dependencies on
>> >>> boto/simplejson which is required for (2).  My question is are they
>> planned
>> >>> to kept in sync. Which one is under active development and hence
>> suggested
>> >>> ( given my preference for hadoop's contrib package )  for stable use
?
>> >>>
>> >>> -Thanks,
>> >>> Prasen
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>>
http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
> Impetus Infotech (india) Pvt Ltd.
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Tom White <to...@cloudera.com>.
Hi Prasen,

2) is now in the Hadoop Common repository, in src/contrib/cloud. This
is where the development effort is focused, and the older bash scripts
(1) will be deprecated over time (HADOOP-6403). The new cloud scripts
are designed to support multiple cloud providers, as well as advanced
features like Amazon's EBS, which the older scripts never did. Indeed,
these features would be difficult to support in bash, which is why the
new scripts use Python. Being able to take advantage of libcloud
(http://incubator.apache.org/libcloud/) makes it feasible to offer
support for more providers using a uniform interface.

It's true that boto is dependency (although one that is
straightforward to install), but you shouldn't need to install
simplejson unless you are using EBS (I haven't checked whether this is
actually the case, but if not, it should be considered a bug).

As for the error you are getting, you seem to have set the environment
variable correctly, so I wonder if it is to do with the version of
boto you are using. I have only used the scripts with version 1.8d,
but 1.9b came out recently, and I haven't tried them with this
version.

Cheers,

Tom

On Sun, Jan 17, 2010 at 10:23 PM, Chandraprakash Bhagtani
<cp...@gmail.com> wrote:
> you need to set following environment variables
>
>
>   - AWS_ACCESS_KEY_ID - Your AWS Access Key ID
>   - AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key
>
>
> On Mon, Jan 18, 2010 at 11:12 AM, prasenjit mukherjee
> <pr...@gmail.com>wrote:
>
>> Thanks for the suggestion.  Now I am getting the following error with
>> cloudera's distro. I have set AWS_SECRET_KEY appropriately though. Any
>> pointers :
>>
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> echo $AWS_SECRET_ACCESS_KEY
>> <.........snipped......................................>
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> ./hadoop-ec2 list
>> Traceback (most recent call last):
>>  File "./hadoop-ec2", line 124, in <module>
>>    list_all()
>>  File
>> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
>> line 43, in list_all
>>    clusters = get_clusters_with_role(MASTER)
>>  File
>> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
>> line 29, in get_clusters_with_role
>>    all = EC2Connection().get_all_instances()
>>  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
>> 69, in __init__
>>    self.region.endpoint, debug, https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 446, in __init__
>>    debug,  https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 169, in __init__
>>    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
>> AttributeError: EC2Connection instance has no attribute
>> 'aws_secret_access_key'
>>
>> -Prasen
>> On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
>> > In my experience, the Cloudera distributions are excellent, actively
>> > developed, and well-supported.
>> >
>> > Zak
>> >
>> >
>> > On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com>
>> wrote:
>> >> My personal experience led me to prefer cloudera. Can't talk for every
>> >> situation, but for me the hadoop distro had many bugs and was
>> unreliable.
>> >>
>> >> Mark
>> >>
>> >> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>> It seems there are 2 hadoop-ec2 scripts:
>> >>>
>> >>> 1) One which comes along with the hadoop distro :
>> >>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>> >>>
>> >>> 2) Another which is downloadable from
>> >>>
>> >>>
>> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>> >>> and is from cloudera folks.
>> >>>
>> >>> I prefer using the base hadoop, as I want to avoid  dependencies on
>> >>> boto/simplejson which is required for (2).  My question is are they
>> planned
>> >>> to kept in sync. Which one is under active development and hence
>> suggested
>> >>> ( given my preference for hadoop's contrib package )  for stable use ?
>> >>>
>> >>> -Thanks,
>> >>> Prasen
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
> Impetus Infotech (india) Pvt Ltd.
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
you need to set following environment variables


   - AWS_ACCESS_KEY_ID - Your AWS Access Key ID
   - AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key


On Mon, Jan 18, 2010 at 11:12 AM, prasenjit mukherjee
<pr...@gmail.com>wrote:

> Thanks for the suggestion.  Now I am getting the following error with
> cloudera's distro. I have set AWS_SECRET_KEY appropriately though. Any
> pointers :
>
> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
> echo $AWS_SECRET_ACCESS_KEY
> <.........snipped......................................>
> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
> ./hadoop-ec2 list
> Traceback (most recent call last):
>  File "./hadoop-ec2", line 124, in <module>
>    list_all()
>  File
> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
> line 43, in list_all
>    clusters = get_clusters_with_role(MASTER)
>  File
> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
> line 29, in get_clusters_with_role
>    all = EC2Connection().get_all_instances()
>  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
> 69, in __init__
>    self.region.endpoint, debug, https_connection_factory, path)
>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
> 446, in __init__
>    debug,  https_connection_factory, path)
>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
> 169, in __init__
>    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
> AttributeError: EC2Connection instance has no attribute
> 'aws_secret_access_key'
>
> -Prasen
> On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
> > In my experience, the Cloudera distributions are excellent, actively
> > developed, and well-supported.
> >
> > Zak
> >
> >
> > On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com>
> wrote:
> >> My personal experience led me to prefer cloudera. Can't talk for every
> >> situation, but for me the hadoop distro had many bugs and was
> unreliable.
> >>
> >> Mark
> >>
> >> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com>
> wrote:
> >>
> >>>
> >>> It seems there are 2 hadoop-ec2 scripts:
> >>>
> >>> 1) One which comes along with the hadoop distro :
> >>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
> >>>
> >>> 2) Another which is downloadable from
> >>>
> >>>
> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
> >>> and is from cloudera folks.
> >>>
> >>> I prefer using the base hadoop, as I want to avoid  dependencies on
> >>> boto/simplejson which is required for (2).  My question is are they
> planned
> >>> to kept in sync. Which one is under active development and hence
> suggested
> >>> ( given my preference for hadoop's contrib package )  for stable use ?
> >>>
> >>> -Thanks,
> >>> Prasen
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,
Impetus Infotech (india) Pvt Ltd.

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by prasenjit mukherjee <pr...@gmail.com>.
Thanks for the suggestion.  Now I am getting the following error with
cloudera's distro. I have set AWS_SECRET_KEY appropriately though. Any
pointers :

pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
echo $AWS_SECRET_ACCESS_KEY
<.........snipped......................................>
pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
./hadoop-ec2 list
Traceback (most recent call last):
  File "./hadoop-ec2", line 124, in <module>
    list_all()
  File "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
line 43, in list_all
    clusters = get_clusters_with_role(MASTER)
  File "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
line 29, in get_clusters_with_role
    all = EC2Connection().get_all_instances()
  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
69, in __init__
    self.region.endpoint, debug, https_connection_factory, path)
  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
446, in __init__
    debug,  https_connection_factory, path)
  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
169, in __init__
    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
AttributeError: EC2Connection instance has no attribute 'aws_secret_access_key'

-Prasen
On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
> In my experience, the Cloudera distributions are excellent, actively
> developed, and well-supported.
>
> Zak
>
>
> On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com> wrote:
>> My personal experience led me to prefer cloudera. Can't talk for every
>> situation, but for me the hadoop distro had many bugs and was unreliable.
>>
>> Mark
>>
>> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com> wrote:
>>
>>>
>>> It seems there are 2 hadoop-ec2 scripts:
>>>
>>> 1) One which comes along with the hadoop distro :
>>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>>>
>>> 2) Another which is downloadable from
>>>
>>> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>>> and is from cloudera folks.
>>>
>>> I prefer using the base hadoop, as I want to avoid  dependencies on
>>> boto/simplejson which is required for (2).  My question is are they planned
>>> to kept in sync. Which one is under active development and hence suggested
>>> ( given my preference for hadoop's contrib package )  for stable use ?
>>>
>>> -Thanks,
>>> Prasen
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Zak Stone <zs...@gmail.com>.
In my experience, the Cloudera distributions are excellent, actively
developed, and well-supported.

Zak


On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com> wrote:
> My personal experience led me to prefer cloudera. Can't talk for every
> situation, but for me the hadoop distro had many bugs and was unreliable.
>
> Mark
>
> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com> wrote:
>
>>
>> It seems there are 2 hadoop-ec2 scripts:
>>
>> 1) One which comes along with the hadoop distro :
>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>>
>> 2) Another which is downloadable from
>>
>> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>> and is from cloudera folks.
>>
>> I prefer using the base hadoop, as I want to avoid  dependencies on
>> boto/simplejson which is required for (2).  My question is are they planned
>> to kept in sync. Which one is under active development and hence suggested
>> ( given my preference for hadoop's contrib package )  for stable use ?
>>
>> -Thanks,
>> Prasen
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Mark Kerzner <ma...@gmail.com>.
My personal experience led me to prefer cloudera. Can't talk for every
situation, but for me the hadoop distro had many bugs and was unreliable.

Mark

On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com> wrote:

>
> It seems there are 2 hadoop-ec2 scripts:
>
> 1) One which comes along with the hadoop distro :
> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>
> 2) Another which is downloadable from
>
> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
> and is from cloudera folks.
>
> I prefer using the base hadoop, as I want to avoid  dependencies on
> boto/simplejson which is required for (2).  My question is are they planned
> to kept in sync. Which one is under active development and hence suggested
> ( given my preference for hadoop's contrib package )  for stable use ?
>
> -Thanks,
> Prasen
>
> --
> View this message in context:
> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>