You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Kevin Peterson <kp...@biz360.com> on 2009/09/29 19:19:49 UTC

Which instance type on Amazon EC2?

Has anyone done any extensive testing of what instance types on Amazon EC2
give you the most bang for the buck?

Given the normal Hadoop recommendations of beefy machines, I would expect
the best performance from the extra-large, but our testing showed otherwise.
We did some rough testing while we were just getting started with like a 10
node cluster, and we found that the extra large instance doesn't come close
to twice the actual performance of the large instance (pricing at $0.80 and
$0.40). My rationalization is that some of the resources are shared, and the
extra-large instance corresponds to the actual hardware, while the large
instance sometimes gets to take advantage of IO and network bandwidth beyond
50% when the other tenant isn't doing much.

I'm revisiting our config because we're deploying HBase soon, and I'm not
sure whether I would be better off going to the extra-large instances so
that I can co-locate the tasktrackers and the region servers on the same
nodes, or if I should stick with large instances and put hbase on separate
servers. Mostly I'm wondering if my results were a fluke.

Re: Which instance type on Amazon EC2?

Posted by Ted Dunning <te...@gmail.com>.
IN our experiments, the large instance turned out better, but that was
largely due to our need for substantial memory.  For many of our jobs, the
different between 4x as many small nodes and large nodes was not
substantial.  We had less than 2x gain from extra large nodes.

For small memory hadoop jobs, the small instance might well turn out
better.  It really depends on how badly it hurts you to have duplicated
memory contents for overhead like the JVM, the task tracker and the OS.

On Tue, Sep 29, 2009 at 11:27 AM, Paul Ingles <pa...@oobaloo.co.uk> wrote:

> Hi,
>
> I don't have any real benchmarks or testing to speak of specifically for
> the performance benefits of a larger instance size. However, we have played
> around a little and for our work (a form of document clustering) the
> benefits of a larger instance were far outweighed by having more of the less
> powerful instances. During the early days of our experiments with Hadoop and
> EC2, this was by far and away the most surprising thing (although in
> retrospect I guess it's no so strange!)
>
> Not sure it answers your question, but food for thought hopefully.
>
> Thanks,
> Paul
>
>
> On 29 Sep 2009, at 18:33, Brian Bockelman wrote:
>
>  Hey Kevin,
>>
>> From seeing presentations from the HEP field (totally unrelated to
>> Hadoop), I've seen folks claim the large instance is more than 4x better
>> than the small, and less than 2x slower than extra-large.  I.e., it provided
>> that application the best bang for its buck.
>>
>> In other words, you're not completely crazy for believing this, and other
>> people have reported seeing non-linear differences between the difference
>> instance types.  I suspect the "best" will depend highly on what your app is
>> doing.
>>
>> Brian
>>
>> On Sep 29, 2009, at 12:19 PM, Kevin Peterson wrote:
>>
>>  Has anyone done any extensive testing of what instance types on Amazon
>>> EC2
>>> give you the most bang for the buck?
>>>
>>> Given the normal Hadoop recommendations of beefy machines, I would expect
>>> the best performance from the extra-large, but our testing showed
>>> otherwise.
>>> We did some rough testing while we were just getting started with like a
>>> 10
>>> node cluster, and we found that the extra large instance doesn't come
>>> close
>>> to twice the actual performance of the large instance (pricing at $0.80
>>> and
>>> $0.40). My rationalization is that some of the resources are shared, and
>>> the
>>> extra-large instance corresponds to the actual hardware, while the large
>>> instance sometimes gets to take advantage of IO and network bandwidth
>>> beyond
>>> 50% when the other tenant isn't doing much.
>>>
>>> I'm revisiting our config because we're deploying HBase soon, and I'm not
>>> sure whether I would be better off going to the extra-large instances so
>>> that I can co-locate the tasktrackers and the region servers on the same
>>> nodes, or if I should stick with large instances and put hbase on
>>> separate
>>> servers. Mostly I'm wondering if my results were a fluke.
>>>
>>
>>
>


-- 
Ted Dunning, CTO
DeepDyve

Re: Which instance type on Amazon EC2?

Posted by Paul Ingles <pa...@oobaloo.co.uk>.
Hi,

I don't have any real benchmarks or testing to speak of specifically  
for the performance benefits of a larger instance size. However, we  
have played around a little and for our work (a form of document  
clustering) the benefits of a larger instance were far outweighed by  
having more of the less powerful instances. During the early days of  
our experiments with Hadoop and EC2, this was by far and away the most  
surprising thing (although in retrospect I guess it's no so strange!)

Not sure it answers your question, but food for thought hopefully.

Thanks,
Paul

On 29 Sep 2009, at 18:33, Brian Bockelman wrote:

> Hey Kevin,
>
> From seeing presentations from the HEP field (totally unrelated to  
> Hadoop), I've seen folks claim the large instance is more than 4x  
> better than the small, and less than 2x slower than extra-large.   
> I.e., it provided that application the best bang for its buck.
>
> In other words, you're not completely crazy for believing this, and  
> other people have reported seeing non-linear differences between the  
> difference instance types.  I suspect the "best" will depend highly  
> on what your app is doing.
>
> Brian
>
> On Sep 29, 2009, at 12:19 PM, Kevin Peterson wrote:
>
>> Has anyone done any extensive testing of what instance types on  
>> Amazon EC2
>> give you the most bang for the buck?
>>
>> Given the normal Hadoop recommendations of beefy machines, I would  
>> expect
>> the best performance from the extra-large, but our testing showed  
>> otherwise.
>> We did some rough testing while we were just getting started with  
>> like a 10
>> node cluster, and we found that the extra large instance doesn't  
>> come close
>> to twice the actual performance of the large instance (pricing at  
>> $0.80 and
>> $0.40). My rationalization is that some of the resources are  
>> shared, and the
>> extra-large instance corresponds to the actual hardware, while the  
>> large
>> instance sometimes gets to take advantage of IO and network  
>> bandwidth beyond
>> 50% when the other tenant isn't doing much.
>>
>> I'm revisiting our config because we're deploying HBase soon, and  
>> I'm not
>> sure whether I would be better off going to the extra-large  
>> instances so
>> that I can co-locate the tasktrackers and the region servers on the  
>> same
>> nodes, or if I should stick with large instances and put hbase on  
>> separate
>> servers. Mostly I'm wondering if my results were a fluke.
>


Re: Which instance type on Amazon EC2?

Posted by Brian Bockelman <bb...@cse.unl.edu>.
Hey Kevin,

 From seeing presentations from the HEP field (totally unrelated to  
Hadoop), I've seen folks claim the large instance is more than 4x  
better than the small, and less than 2x slower than extra-large.   
I.e., it provided that application the best bang for its buck.

In other words, you're not completely crazy for believing this, and  
other people have reported seeing non-linear differences between the  
difference instance types.  I suspect the "best" will depend highly on  
what your app is doing.

Brian

On Sep 29, 2009, at 12:19 PM, Kevin Peterson wrote:

> Has anyone done any extensive testing of what instance types on  
> Amazon EC2
> give you the most bang for the buck?
>
> Given the normal Hadoop recommendations of beefy machines, I would  
> expect
> the best performance from the extra-large, but our testing showed  
> otherwise.
> We did some rough testing while we were just getting started with  
> like a 10
> node cluster, and we found that the extra large instance doesn't  
> come close
> to twice the actual performance of the large instance (pricing at  
> $0.80 and
> $0.40). My rationalization is that some of the resources are shared,  
> and the
> extra-large instance corresponds to the actual hardware, while the  
> large
> instance sometimes gets to take advantage of IO and network  
> bandwidth beyond
> 50% when the other tenant isn't doing much.
>
> I'm revisiting our config because we're deploying HBase soon, and  
> I'm not
> sure whether I would be better off going to the extra-large  
> instances so
> that I can co-locate the tasktrackers and the region servers on the  
> same
> nodes, or if I should stick with large instances and put hbase on  
> separate
> servers. Mostly I'm wondering if my results were a fluke.


Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by prasenjit mukherjee <pr...@gmail.com>.
"echo $AWS_SECRET_ACCESS_KEY"   returns a valid string ( as already
mentioned in my mail )

Any other settings I need to do for boto to pick it up ?

-Prasen

On Mon, Jan 18, 2010 at 11:53 AM, Chandraprakash Bhagtani <
cpbhagtani@gmail.com> wrote:
> you need to set following environment variables
>
>
>   - AWS_ACCESS_KEY_ID - Your AWS Access Key ID
>   - AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key
>
>
> On Mon, Jan 18, 2010 at 11:12 AM, prasenjit mukherjee
> <pr...@gmail.com>wrote:
>
>> Thanks for the suggestion.  Now I am getting the following error with
>> cloudera's distro.* I have set AWS_SECRET_KEY appropriately though. Any
>> pointers :
>>
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> echo $AWS_SECRET_ACCESS_KEY
>> <.........snipped......................................>*
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> ./hadoop-ec2 list
>> Traceback (most recent call last):
>>  File "./hadoop-ec2", line 124, in <module>
>>    list_all()
>>  File
>>
"/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
>> line 43, in list_all
>>    clusters = get_clusters_with_role(MASTER)
>>  File
>>
"/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
>> line 29, in get_clusters_with_role
>>    all = EC2Connection().get_all_instances()
>>  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
>> 69, in __init__
>>    self.region.endpoint, debug, https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 446, in __init__
>>    debug,  https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 169, in __init__
>>    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
>> AttributeError: EC2Connection instance has no attribute
>> 'aws_secret_access_key'
>>
>> -Prasen
>> On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
>> > In my experience, the Cloudera distributions are excellent, actively
>> > developed, and well-supported.
>> >
>> > Zak
>> >
>> >
>> > On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com>
>> wrote:
>> >> My personal experience led me to prefer cloudera. Can't talk for every
>> >> situation, but for me the hadoop distro had many bugs and was
>> unreliable.
>> >>
>> >> Mark
>> >>
>> >> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>> It seems there are 2 hadoop-ec2 scripts:
>> >>>
>> >>> 1) One which comes along with the hadoop distro :
>> >>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>> >>>
>> >>> 2) Another which is downloadable from
>> >>>
>> >>>
>>
http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>> >>> and is from cloudera folks.
>> >>>
>> >>> I prefer using the base hadoop, as I want to avoid  dependencies on
>> >>> boto/simplejson which is required for (2).  My question is are they
>> planned
>> >>> to kept in sync. Which one is under active development and hence
>> suggested
>> >>> ( given my preference for hadoop's contrib package )  for stable use
?
>> >>>
>> >>> -Thanks,
>> >>> Prasen
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>>
http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
> Impetus Infotech (india) Pvt Ltd.
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Tom White <to...@cloudera.com>.
Hi Prasen,

2) is now in the Hadoop Common repository, in src/contrib/cloud. This
is where the development effort is focused, and the older bash scripts
(1) will be deprecated over time (HADOOP-6403). The new cloud scripts
are designed to support multiple cloud providers, as well as advanced
features like Amazon's EBS, which the older scripts never did. Indeed,
these features would be difficult to support in bash, which is why the
new scripts use Python. Being able to take advantage of libcloud
(http://incubator.apache.org/libcloud/) makes it feasible to offer
support for more providers using a uniform interface.

It's true that boto is dependency (although one that is
straightforward to install), but you shouldn't need to install
simplejson unless you are using EBS (I haven't checked whether this is
actually the case, but if not, it should be considered a bug).

As for the error you are getting, you seem to have set the environment
variable correctly, so I wonder if it is to do with the version of
boto you are using. I have only used the scripts with version 1.8d,
but 1.9b came out recently, and I haven't tried them with this
version.

Cheers,

Tom

On Sun, Jan 17, 2010 at 10:23 PM, Chandraprakash Bhagtani
<cp...@gmail.com> wrote:
> you need to set following environment variables
>
>
>   - AWS_ACCESS_KEY_ID - Your AWS Access Key ID
>   - AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key
>
>
> On Mon, Jan 18, 2010 at 11:12 AM, prasenjit mukherjee
> <pr...@gmail.com>wrote:
>
>> Thanks for the suggestion.  Now I am getting the following error with
>> cloudera's distro. I have set AWS_SECRET_KEY appropriately though. Any
>> pointers :
>>
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> echo $AWS_SECRET_ACCESS_KEY
>> <.........snipped......................................>
>> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
>> ./hadoop-ec2 list
>> Traceback (most recent call last):
>>  File "./hadoop-ec2", line 124, in <module>
>>    list_all()
>>  File
>> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
>> line 43, in list_all
>>    clusters = get_clusters_with_role(MASTER)
>>  File
>> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
>> line 29, in get_clusters_with_role
>>    all = EC2Connection().get_all_instances()
>>  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
>> 69, in __init__
>>    self.region.endpoint, debug, https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 446, in __init__
>>    debug,  https_connection_factory, path)
>>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
>> 169, in __init__
>>    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
>> AttributeError: EC2Connection instance has no attribute
>> 'aws_secret_access_key'
>>
>> -Prasen
>> On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
>> > In my experience, the Cloudera distributions are excellent, actively
>> > developed, and well-supported.
>> >
>> > Zak
>> >
>> >
>> > On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com>
>> wrote:
>> >> My personal experience led me to prefer cloudera. Can't talk for every
>> >> situation, but for me the hadoop distro had many bugs and was
>> unreliable.
>> >>
>> >> Mark
>> >>
>> >> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com>
>> wrote:
>> >>
>> >>>
>> >>> It seems there are 2 hadoop-ec2 scripts:
>> >>>
>> >>> 1) One which comes along with the hadoop distro :
>> >>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>> >>>
>> >>> 2) Another which is downloadable from
>> >>>
>> >>>
>> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>> >>> and is from cloudera folks.
>> >>>
>> >>> I prefer using the base hadoop, as I want to avoid  dependencies on
>> >>> boto/simplejson which is required for (2).  My question is are they
>> planned
>> >>> to kept in sync. Which one is under active development and hence
>> suggested
>> >>> ( given my preference for hadoop's contrib package )  for stable use ?
>> >>>
>> >>> -Thanks,
>> >>> Prasen
>> >>>
>> >>> --
>> >>> View this message in context:
>> >>>
>> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>> >>>
>> >>>
>> >>
>> >
>>
>
>
>
> --
> Thanks & Regards,
> Chandra Prakash Bhagtani,
> Impetus Infotech (india) Pvt Ltd.
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
you need to set following environment variables


   - AWS_ACCESS_KEY_ID - Your AWS Access Key ID
   - AWS_SECRET_ACCESS_KEY - Your AWS Secret Access Key


On Mon, Jan 18, 2010 at 11:12 AM, prasenjit mukherjee
<pr...@gmail.com>wrote:

> Thanks for the suggestion.  Now I am getting the following error with
> cloudera's distro. I have set AWS_SECRET_KEY appropriately though. Any
> pointers :
>
> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
> echo $AWS_SECRET_ACCESS_KEY
> <.........snipped......................................>
> pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
> ./hadoop-ec2 list
> Traceback (most recent call last):
>  File "./hadoop-ec2", line 124, in <module>
>    list_all()
>  File
> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
> line 43, in list_all
>    clusters = get_clusters_with_role(MASTER)
>  File
> "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
> line 29, in get_clusters_with_role
>    all = EC2Connection().get_all_instances()
>  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
> 69, in __init__
>    self.region.endpoint, debug, https_connection_factory, path)
>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
> 446, in __init__
>    debug,  https_connection_factory, path)
>  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
> 169, in __init__
>    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
> AttributeError: EC2Connection instance has no attribute
> 'aws_secret_access_key'
>
> -Prasen
> On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
> > In my experience, the Cloudera distributions are excellent, actively
> > developed, and well-supported.
> >
> > Zak
> >
> >
> > On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com>
> wrote:
> >> My personal experience led me to prefer cloudera. Can't talk for every
> >> situation, but for me the hadoop distro had many bugs and was
> unreliable.
> >>
> >> Mark
> >>
> >> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com>
> wrote:
> >>
> >>>
> >>> It seems there are 2 hadoop-ec2 scripts:
> >>>
> >>> 1) One which comes along with the hadoop distro :
> >>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
> >>>
> >>> 2) Another which is downloadable from
> >>>
> >>>
> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
> >>> and is from cloudera folks.
> >>>
> >>> I prefer using the base hadoop, as I want to avoid  dependencies on
> >>> boto/simplejson which is required for (2).  My question is are they
> planned
> >>> to kept in sync. Which one is under active development and hence
> suggested
> >>> ( given my preference for hadoop's contrib package )  for stable use ?
> >>>
> >>> -Thanks,
> >>> Prasen
> >>>
> >>> --
> >>> View this message in context:
> >>>
> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
> >>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
> >>>
> >>>
> >>
> >
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,
Impetus Infotech (india) Pvt Ltd.

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by prasenjit mukherjee <pr...@gmail.com>.
Thanks for the suggestion.  Now I am getting the following error with
cloudera's distro. I have set AWS_SECRET_KEY appropriately though. Any
pointers :

pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
echo $AWS_SECRET_ACCESS_KEY
<.........snipped......................................>
pmukherjee@ubuntu:~/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta$
./hadoop-ec2 list
Traceback (most recent call last):
  File "./hadoop-ec2", line 124, in <module>
    list_all()
  File "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/commands.py",
line 43, in list_all
    clusters = get_clusters_with_role(MASTER)
  File "/home/pmukherjee/apps/cloudera-for-hadoop-on-ec2-py-0.2.0-beta/hadoop/ec2/cluster.py",
line 29, in get_clusters_with_role
    all = EC2Connection().get_all_instances()
  File "/usr/lib/python2.5/site-packages/boto/ec2/connection.py", line
69, in __init__
    self.region.endpoint, debug, https_connection_factory, path)
  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
446, in __init__
    debug,  https_connection_factory, path)
  File "/usr/lib/python2.5/site-packages/boto/connection.py", line
169, in __init__
    self.hmac = hmac.new(self.aws_secret_access_key, digestmod=sha)
AttributeError: EC2Connection instance has no attribute 'aws_secret_access_key'

-Prasen
On Mon, Jan 18, 2010 at 10:47 AM, Zak Stone <zs...@gmail.com> wrote:
> In my experience, the Cloudera distributions are excellent, actively
> developed, and well-supported.
>
> Zak
>
>
> On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com> wrote:
>> My personal experience led me to prefer cloudera. Can't talk for every
>> situation, but for me the hadoop distro had many bugs and was unreliable.
>>
>> Mark
>>
>> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com> wrote:
>>
>>>
>>> It seems there are 2 hadoop-ec2 scripts:
>>>
>>> 1) One which comes along with the hadoop distro :
>>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>>>
>>> 2) Another which is downloadable from
>>>
>>> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>>> and is from cloudera folks.
>>>
>>> I prefer using the base hadoop, as I want to avoid  dependencies on
>>> boto/simplejson which is required for (2).  My question is are they planned
>>> to kept in sync. Which one is under active development and hence suggested
>>> ( given my preference for hadoop's contrib package )  for stable use ?
>>>
>>> -Thanks,
>>> Prasen
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>>
>>>
>>
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Zak Stone <zs...@gmail.com>.
In my experience, the Cloudera distributions are excellent, actively
developed, and well-supported.

Zak


On Mon, Jan 18, 2010 at 12:01 AM, Mark Kerzner <ma...@gmail.com> wrote:
> My personal experience led me to prefer cloudera. Can't talk for every
> situation, but for me the hadoop distro had many bugs and was unreliable.
>
> Mark
>
> On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com> wrote:
>
>>
>> It seems there are 2 hadoop-ec2 scripts:
>>
>> 1) One which comes along with the hadoop distro :
>> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>>
>> 2) Another which is downloadable from
>>
>> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
>> and is from cloudera folks.
>>
>> I prefer using the base hadoop, as I want to avoid  dependencies on
>> boto/simplejson which is required for (2).  My question is are they planned
>> to kept in sync. Which one is under active development and hence suggested
>> ( given my preference for hadoop's contrib package )  for stable use ?
>>
>> -Thanks,
>> Prasen
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
>> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>>
>>
>

Re: which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by Mark Kerzner <ma...@gmail.com>.
My personal experience led me to prefer cloudera. Can't talk for every
situation, but for me the hadoop distro had many bugs and was unreliable.

Mark

On Sun, Jan 17, 2010 at 10:58 PM, prasenjit <pr...@gmail.com> wrote:

>
> It seems there are 2 hadoop-ec2 scripts:
>
> 1) One which comes along with the hadoop distro :
> <hadoop>/src/contrib/ec2/bin/hadoop-ec2
>
> 2) Another which is downloadable from
>
> http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
> and is from cloudera folks.
>
> I prefer using the base hadoop, as I want to avoid  dependencies on
> boto/simplejson which is required for (2).  My question is are they planned
> to kept in sync. Which one is under active development and hence suggested
> ( given my preference for hadoop's contrib package )  for stable use ?
>
> -Thanks,
> Prasen
>
> --
> View this message in context:
> http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

which hadoop-ec2 is preferred ( cloudera/hadoop ? )

Posted by prasenjit <pr...@gmail.com>.
It seems there are 2 hadoop-ec2 scripts:

1) One which comes along with the hadoop distro :
<hadoop>/src/contrib/ec2/bin/hadoop-ec2

2) Another which is downloadable from
http://cloudera-packages.s3.amazonaws.com/cloudera-for-hadoop-on-ec2-py-0.3.0-beta.tar.gz
and is from cloudera folks. 

I prefer using the base hadoop, as I want to avoid  dependencies on
boto/simplejson which is required for (2).  My question is are they planned
to kept in sync. Which one is under active development and hence suggested 
( given my preference for hadoop's contrib package )  for stable use ? 

-Thanks,
Prasen

-- 
View this message in context: http://old.nabble.com/Which-instance-type-on-Amazon-EC2--tp25667297p27206207.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.