You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Andrew Purtell <ap...@apache.org> on 2010/03/12 21:26:33 UTC

on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

During the Q&A period after my presentation at HUG9, it was interesting that some in the audience indicated they are running production Hadoop and/or HBase clusters on EC2. I want to follow up on some comments I made there. 

This is a little surprising, because currently the HDFS NameNode is a single point of failure which can bring the whole service 
down. That the NameNode is a SPOF is not quite so large a concern if you have the ability to engineer the particular server hosting the NameNode to be especially reliable. However, when 
architecting services on EC2, you must be mindful of its guarantees, or lack thereof. On EC2 the reliability of any given instance is not guaranteed, only the service in the aggregate.

Running 
Hadoop on top of EC2 in production is thus not advised until there is a good hot 
fail over solution for the NameNode. 

AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/. Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So should you. 

Regarding a hot fail over solution for the NameNode, there is some really interesting work ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the architecture. 


    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html


    http://issues.apache.org/jira/browse/HDFS-976

    http://issues.apache.org/jira/browse/HDFS-234
        http://issues.apache.org/jira/secure/attachment/12399656/create.png
        https://issues.apache.org/jira/browse/ZOOKEEPER-276
Once something like the above is vetted and tested, of course my above advice changes and it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.

In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase EC2 scripts are a useful tool for doing such things with relative ease. 

Best regards,

   - Andy



----- Original Message ----
From: Jonathan Gray
To: hbase-user@hadoop.apache.org
Sent: Thu, March 11, 2010 3:01:22 PM
Subject: RE: [databasepro-48] HUG9

Pardon the link vomit, hopefully this comes across okay...


HBase Project Update by Jonathan Gray

http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
get&target=HUG9_HBaseUpdate_JonathanGray.pdf


HBase and HDFS by Todd Lipcon of Cloudera

http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf


HBase on EC2 by Andrew Purtell of Trend Micro

http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf

RE: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Jonathan Gray <jl...@streamy.com>.

Just FYI, after sharing this thread with my client, they've decided to go
for some monthly dedicated servers from softlayer.com instead of EC2.  For
one, they will be using lots of inbound traffic and they have a promo for
free inbound.  2TB/mo outbound for free as well.  When you take that
bandwidth into account, it's significantly cheaper than EC2 for 24/7 stuff
and also it's not virtualized and you can get whatever disks you want.  2
hour turnaround, supposedly.

Will report back how it goes :)

> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Saturday, March 13, 2010 12:47 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> HUG9)
> 
> Hmm...
> 
> I know you only used leeware as an example Edward. :-)
> 
> I'd caution you have to be careful. Obviously only a subset of low cost
> options are suitable and you need to know what you are doing.
> 
> Given this example, leeware servers would be possibly useful but
> underperforming
> for plain MapReduce, due to fast Ethernet only interconnect between the
> servers
> and hardly any disk, but underperforming and problematic for HBase.
> Connections
> between servers in a HBase cluster should be GigE in my experience,
> unless you're
> planning to serve everything out of RAM (block cache). In that regard,
> the memory
> configuration of leeware servers is not sufficient. Additionally there
> is not
> enough RAM to support HBase and map reduce tasks on the same servers.
> There's
> hardly any disk to back a table of any size which justifies use of
> HBase in the
> first place.
> 
> There are other managed hosting providers that can do GigE interconnect
> and
> useful disk configurations, but they cost more obviously.
> 
> 
>    -  Andy
> 
> 
> ----- Original Message ----
> > From: Edward Capriolo <ed...@gmail.com>
> > To: hbase-user@hadoop.apache.org
> > Sent: Sat, March 13, 2010 8:41:37 AM
> > Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-
> 48] HUG9)
> [...]
> > I have not used EC2 extensively but some of the things you can do are
> very
> > impressive in terms of spin up. As a sys-admin and a guy who worked
> at a
> > data center, I would suggest to shop around. Do not fall in love with
> EC2
> > because its hip. I you are short on cash. You can get 6 dedicated
> services
> > for $375.00 USD Per Month
> 
> >http://www.leeware.com/services.html. (I use leeware for some hosting)
> > That is a big difference 6 servers for 375 vs 1 VM for  $500.
> 
> 
> 
>

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Andrew Purtell <ap...@apache.org>.

Hmm...

I know you only used leeware as an example Edward. :-)

I'd caution you have to be careful. Obviously only a subset of low cost
options are suitable and you need to know what you are doing. 

Given this example, leeware servers would be possibly useful but underperforming
for plain MapReduce, due to fast Ethernet only interconnect between the servers
and hardly any disk, but underperforming and problematic for HBase. Connections
between servers in a HBase cluster should be GigE in my experience, unless you're
planning to serve everything out of RAM (block cache). In that regard, the memory
configuration of leeware servers is not sufficient. Additionally there is not
enough RAM to support HBase and map reduce tasks on the same servers. There's
hardly any disk to back a table of any size which justifies use of HBase in the
first place. 

There are other managed hosting providers that can do GigE interconnect and 
useful disk configurations, but they cost more obviously.


   -  Andy


----- Original Message ----
> From: Edward Capriolo <ed...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Sat, March 13, 2010 8:41:37 AM
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
[...]
> I have not used EC2 extensively but some of the things you can do are very 
> impressive in terms of spin up. As a sys-admin and a guy who worked at a 
> data center, I would suggest to shop around. Do not fall in love with EC2 
> because its hip. I you are short on cash. You can get 6 dedicated services 
> for $375.00 USD Per Month

>http://www.leeware.com/services.html. (I use leeware for some hosting)
> That is a big difference 6 servers for 375 vs 1 VM for  $500.

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Andrew Purtell <ap...@apache.org>.

Hi Vaibhav,

My advice is for the unaware. :-) 

No implication or disrespect is meant for others.

We have targeted our EC2 scripts at the newcomer, early evaluator, or casual
experimenter, though they can for sure serve as a starting point to build
something more professional/production.  So maybe someone coming to EC2 via our
scripts may not be fully aware of the risks and can use some advice. As you say,
they are not sys admins.

When there is a hot fail over solution for Hadoop that can work pretty much out
of the box, most (all?) of this will be moot.

   - Andy


----- Original Message ----
> From: Vaibhav Puranik <vp...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Sat, March 13, 2010 12:22:00 PM
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
[...]
> Thus, it's not true that the people using HBase on EC2 are not aware of the
> risks involved. They are absolutely ok with the risks! It's a choice we have
> made deliberately.

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Vaibhav Puranik <vp...@gmail.com>.

We (GumGum) are using HBase on EC2 happily for past 8 months. Here is why we
chose EC2:

(All of these points have been mentioned by Jon before, but I am reiterating
them here. I think it's important that people opposing ec2 understand
that *these
considerations were the most important considerations for us*).

1) The product we were using was experimental.
2) We had no sys admins.
3) The downtime of few minutes (to bring the cluster back up form a failure)
would not cost us a lot.
4) We were given a short time to bring our software to market.

During eight months we had only had one major failure because of EC2. And
the failure did not happen suddenly. *EC2 warned us that the machine we were
running our namenode on has gone bad and therefore we should replace it.* We
simply booted a new instance (with our hadoop/hbase bundled in it) and
pointed the rest of the nodes to the new namenode.

With EBS, our data is safe. In fact because of the *new EBS backed instance
feature,* it has become even more easier to manage hadoop/hbase on ec2.

Thus, it's not true that the people using HBase on EC2 are not aware of the
risks involved. They are absolutely ok with the risks! It's a choice we have
made deliberately.

Regards,
Vaibhav Puranik
http://aws-musings.com/




On Sat, Mar 13, 2010 at 11:28 AM, Bradford Stephens <
bradfordstephens@gmail.com> wrote:

> I don't recommend our customers use EC2 -- especially when you can buy
> last-gen 8 core / 8 GB-16GB boxes for $1000-$2000 each, which is all
> HBase needs to be happy (unless you're running something like su.pr).
>
> That being said, we're prototyping and building EC2 mgmt scripts,
> because a lot of customers want to try out our platform there.
>
> In fact, we're rolling out EBS + HBase management on Crane, which is
> cloud management using Clojure.
>
> -B
>
> On Sat, Mar 13, 2010 at 1:13 PM, Jonathan Gray <jl...@streamy.com> wrote:
> > Prasen,
> >
> > You could definitely do something like that.  As long as you keep
> everything
> > for your Hadoop/HBase setup to use EBS volumes, you should be able to
> spin
> > the cluster down, turn off the nodes, and then bring them back up at a
> later
> > time with all the data still intact.
> >
> > JG
> >
> >> -----Original Message-----
> >> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> >> Sent: Saturday, March 13, 2010 8:42 AM
> >> To: hbase-user@hadoop.apache.org
> >> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> >> HUG9)
> >>
> >> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
> >> <pr...@gmail.com> wrote:
> >> > I agree that running 24/7 hbase servers on ec2 is not advisable. But
> >> I
> >> > need some suggestions for running mapred-jobs ( in batches ) followed
> >> > by updating the results on an existing hbase server.
> >> >
> >> > Is it advisable to use EBS drives ( attached to each different  slave
> >> > )  and have them configured as HDSF Storage Directory ?  And then use
> >> > hbase on top of it. I am assuming that ec2 clusters can be shutdown
> >> > and restarted ( at a later point of time ) to use the same hbase.
> >> >
> >> > -Prasen
> >> >
> >> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <ap...@apache.org>
> >> wrote:
> >> >> During the Q&A period after my presentation at HUG9, it was
> >> interesting that some in the audience indicated they are running
> >> production Hadoop and/or HBase clusters on EC2. I want to follow up on
> >> some comments I made there.
> >> >>
> >> >> This is a little surprising, because currently the HDFS NameNode is
> >> a single point of failure which can bring the whole service
> >> >> down. That the NameNode is a SPOF is not quite so large a concern if
> >> you have the ability to engineer the particular server hosting the
> >> NameNode to be especially reliable. However, when
> >> >> architecting services on EC2, you must be mindful of its guarantees,
> >> or lack thereof. On EC2 the reliability of any given instance is not
> >> guaranteed, only the service in the aggregate.
> >> >>
> >> >> Running
> >> >> Hadoop on top of EC2 in production is thus not advised until there
> >> is a good hot
> >> >> fail over solution for the NameNode.
> >> >>
> >> >> AWS offers a form of hosted Hadoop called Elastic MapReduce:
> >> http://aws.amazon.com/elasticmapreduce/. Note this service treats the
> >> Hadoop/HDFS cluster as a transient unreliable construction. So should
> >> you.
> >> >>
> >> >> Regarding a hot fail over solution for the NameNode, there is some
> >> really interesting work ongoing at the moment -- "AvatarNode", possibly
> >> with inclusion of "BookKeeper" in the architecture.
> >> >>
> >> >>
> >> >>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-
> >> availability.html
> >> >>
> >> >>
> >> >>    http://issues.apache.org/jira/browse/HDFS-976
> >> >>
> >> >>    http://issues.apache.org/jira/browse/HDFS-234
> >> >>
> >>  http://issues.apache.org/jira/secure/attachment/12399656/create.png
> >> >>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
> >> >> Once something like the above is vetted and tested, of course my
> >> above advice changes and it would become possible to architect reliable
> >> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
> >> >>
> >> >> In the meantime, EC2 and similar IaaS clouds are a great resource
> >> for prototyping, research and development, and hosting ephemeral
> >> clusters for QA or end to end system tests. The HBase EC2 scripts are a
> >> useful tool for doing such things with relative ease.
> >> >>
> >> >> Best regards,
> >> >>
> >> >>   - Andy
> >> >>
> >> >>
> >> >>
> >> >> ----- Original Message ----
> >> >> From: Jonathan Gray
> >> >> To: hbase-user@hadoop.apache.org
> >> >> Sent: Thu, March 11, 2010 3:01:22 PM
> >> >> Subject: RE: [databasepro-48] HUG9
> >> >>
> >> >> Pardon the link vomit, hopefully this comes across okay...
> >> >>
> >> >>
> >> >> HBase Project Update by Jonathan Gray
> >> >>
> >> >>
> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> >> e&do=
> >> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
> >> >>
> >> >>
> >> >> HBase and HDFS by Todd Lipcon of Cloudera
> >> >>
> >> >>
> >> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> >> e&do=
> >> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
> >> >>
> >> >>
> >> >> HBase on EC2 by Andrew Purtell of Trend Micro
> >> >>
> >> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >
> >>
> >> I have not used EC2 extensively but some of the things you can do are
> >> very impressive in terms of spin up.
> >>
> >> As a sys-admin and a guy who worked at a data center, I would suggest
> >> to shop around. Do not fall in love with EC2 because its hip. I you
> >> are short on cash. You can get 6 dedicated services for $375.00 USD
> >> Per Month
> >> http://www.leeware.com/services.html. (I use leeware for some hosting)
> >> That is a big difference 6 servers for 375 vs 1 VM for $500.
> >>
> >> I am not saying use service X or service Y, but I do not see much
> >> value. If you have a small strong ops team with
> >> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic.
> >
> >
> >
>
>
>
> --
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Bradford Stephens <br...@gmail.com>.

I don't recommend our customers use EC2 -- especially when you can buy
last-gen 8 core / 8 GB-16GB boxes for $1000-$2000 each, which is all
HBase needs to be happy (unless you're running something like su.pr).

That being said, we're prototyping and building EC2 mgmt scripts,
because a lot of customers want to try out our platform there.

In fact, we're rolling out EBS + HBase management on Crane, which is
cloud management using Clojure.

-B

On Sat, Mar 13, 2010 at 1:13 PM, Jonathan Gray <jl...@streamy.com> wrote:
> Prasen,
>
> You could definitely do something like that.  As long as you keep everything
> for your Hadoop/HBase setup to use EBS volumes, you should be able to spin
> the cluster down, turn off the nodes, and then bring them back up at a later
> time with all the data still intact.
>
> JG
>
>> -----Original Message-----
>> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
>> Sent: Saturday, March 13, 2010 8:42 AM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
>> HUG9)
>>
>> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
>> <pr...@gmail.com> wrote:
>> > I agree that running 24/7 hbase servers on ec2 is not advisable. But
>> I
>> > need some suggestions for running mapred-jobs ( in batches ) followed
>> > by updating the results on an existing hbase server.
>> >
>> > Is it advisable to use EBS drives ( attached to each different  slave
>> > )  and have them configured as HDSF Storage Directory ?  And then use
>> > hbase on top of it. I am assuming that ec2 clusters can be shutdown
>> > and restarted ( at a later point of time ) to use the same hbase.
>> >
>> > -Prasen
>> >
>> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <ap...@apache.org>
>> wrote:
>> >> During the Q&A period after my presentation at HUG9, it was
>> interesting that some in the audience indicated they are running
>> production Hadoop and/or HBase clusters on EC2. I want to follow up on
>> some comments I made there.
>> >>
>> >> This is a little surprising, because currently the HDFS NameNode is
>> a single point of failure which can bring the whole service
>> >> down. That the NameNode is a SPOF is not quite so large a concern if
>> you have the ability to engineer the particular server hosting the
>> NameNode to be especially reliable. However, when
>> >> architecting services on EC2, you must be mindful of its guarantees,
>> or lack thereof. On EC2 the reliability of any given instance is not
>> guaranteed, only the service in the aggregate.
>> >>
>> >> Running
>> >> Hadoop on top of EC2 in production is thus not advised until there
>> is a good hot
>> >> fail over solution for the NameNode.
>> >>
>> >> AWS offers a form of hosted Hadoop called Elastic MapReduce:
>> http://aws.amazon.com/elasticmapreduce/. Note this service treats the
>> Hadoop/HDFS cluster as a transient unreliable construction. So should
>> you.
>> >>
>> >> Regarding a hot fail over solution for the NameNode, there is some
>> really interesting work ongoing at the moment -- "AvatarNode", possibly
>> with inclusion of "BookKeeper" in the architecture.
>> >>
>> >>
>> >>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-
>> availability.html
>> >>
>> >>
>> >>    http://issues.apache.org/jira/browse/HDFS-976
>> >>
>> >>    http://issues.apache.org/jira/browse/HDFS-234
>> >>
>>  http://issues.apache.org/jira/secure/attachment/12399656/create.png
>> >>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
>> >> Once something like the above is vetted and tested, of course my
>> above advice changes and it would become possible to architect reliable
>> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
>> >>
>> >> In the meantime, EC2 and similar IaaS clouds are a great resource
>> for prototyping, research and development, and hosting ephemeral
>> clusters for QA or end to end system tests. The HBase EC2 scripts are a
>> useful tool for doing such things with relative ease.
>> >>
>> >> Best regards,
>> >>
>> >>   - Andy
>> >>
>> >>
>> >>
>> >> ----- Original Message ----
>> >> From: Jonathan Gray
>> >> To: hbase-user@hadoop.apache.org
>> >> Sent: Thu, March 11, 2010 3:01:22 PM
>> >> Subject: RE: [databasepro-48] HUG9
>> >>
>> >> Pardon the link vomit, hopefully this comes across okay...
>> >>
>> >>
>> >> HBase Project Update by Jonathan Gray
>> >>
>> >>
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
>> e&do=
>> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
>> >>
>> >>
>> >> HBase and HDFS by Todd Lipcon of Cloudera
>> >>
>> >>
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
>> e&do=
>> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
>> >>
>> >>
>> >> HBase on EC2 by Andrew Purtell of Trend Micro
>> >>
>> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>> >>
>> >>
>> >>
>> >>
>> >>
>> >
>>
>> I have not used EC2 extensively but some of the things you can do are
>> very impressive in terms of spin up.
>>
>> As a sys-admin and a guy who worked at a data center, I would suggest
>> to shop around. Do not fall in love with EC2 because its hip. I you
>> are short on cash. You can get 6 dedicated services for $375.00 USD
>> Per Month
>> http://www.leeware.com/services.html. (I use leeware for some hosting)
>> That is a big difference 6 servers for 375 vs 1 VM for $500.
>>
>> I am not saying use service X or service Y, but I do not see much
>> value. If you have a small strong ops team with
>> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic.
>
>
>



-- 
http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Charles Woerner / IMAP <ch...@gmail.com>.

Andrew,

The public dns hostname for ec2 instances follows a namining  
convention based on the ip address and the ec2 internal dns system  
automatically translates lookups of the public dns hostnames to the  
internal ip.  So if you assign an elastic ip to a server, then you may  
be able to use the public dns hostname in your config and avoid the  
data transfer fee.

--
Thanks,

Charles Woerner

On Mar 13, 2010, at 12:32 PM, Andrew Purtell <ap...@apache.org>  
wrote:

> The data will be intact, but the config will be invalidated, right?
>
> After a cluster has been suspended and then resumed, all of the  
> assigned IP
> addresses will be different. So this would render all of the Hadoop  
> and
> HBase configuration files invalid. The data will be there but you  
> will have
> to go fix up all of the config files on all of your instances, somehow
> accounting for which is master, which is slave, which is zookeeper.  
> Elastic
> IPs might help, but I wouldn't use them because while instance-to- 
> instance
> data transfers are free, that is NOT the case when elastic IPs are  
> used for
> internal traffic.
>
>
> This could be automated. You can track the roles of the instances  
> locally,
> make a local "suspend script" which shuts Hadoop and HBase down  
> before you
> suspend the cluster, and make a local "resume script" which  
> remembers the
> role of each instance, logs on to the instance after it has been
> reactivated, performs the appropriate substitutions on Hadoop and  
> HBase
> config files, and then restarts the daemons.
>
> Taking this further:
>
> HBase is almost free of static configuration: The master and the  
> slaves
> need to know the network locations of the ZooKeeper quorum ensemble  
> peers.
> The master needs to know the network location of the HDFS NameNode.  
> At some
> future time if an option for Hadoop configuration hosting in  
> ZooKeeper is
> developed, then the HBase master could learn the address of the  
> NameNode
> from ZK. Presumably the HDFS DataNodes would do the same, and so the  
> only
> static detail for everything would be the network location of the ZK
> ensemble peers. At this point you could write them as DNS hostnames  
> and
> then dynamically update DNS instead of performing a bunch of fixups on
> config files.
>
>    - Andy
>
>
> ----- Original Message ----
> From: Jonathan Gray <jl...@streamy.com>
> To: hbase-user@hadoop.apache.org
> Sent: Sat, March 13, 2010 10:13:08 AM
> Subject: RE: on Hadoop reliability wrt. EC2 (was: Re:  
> [databasepro-48] HUG9)
>
> Prasen,
>
> You could definitely do something like that.  As long as you keep  
> everything
> for your Hadoop/HBase setup to use EBS volumes, you should be able  
> to spin
> the cluster down, turn off the nodes, and then bring them back up at  
> a later
> time with all the data still intact.
>
> JG
>
>>
>> -----Original Message-----
>> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
>>> I agree that running 24/7 hbase servers on ec2 is not advisable.   
>>> But I
>>> need some suggestions for running mapred-jobs ( in  batches )  
>>> followed
>>> by updating the results on an existing hbase server.
>
>
>
>

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Andrew Purtell <ap...@yahoo.com>.

Useful blog!

Also see https://issues.apache.org/jira/browse/HBASE-2327

It's a start...

Thanks Vaibhav. 

--- On Mon, 3/15/10, Vaibhav Puranik <vp...@gmail.com> wrote:

> From: Vaibhav Puranik
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
> To: hbase-user@hadoop.apache.org
> Date: Monday, March 15, 2010, 2:59 PM
>
> Andy,
> 
> In the current form, the scripts are not useful to the
> community as it has lot of our stuff in it. But I will try
> to see if I can come up with something that's useful to
> others.
> I am also thinking of writing a blog post about this on my
> ec2 related blog
>   (http://aws-musings.com/)
> 
> Regards,
> Vaibhav
> 
> On Sun, Mar 14, 2010 at 3:58 PM, Andrew Purtell wrote:
> 
> > Hey Vaibhav,
> >
> > Do you think any of your #2 would be generally useful
> > for others and something we might fold into the public HBase EC2
> > scripts? I don't want to be presumptive, but let me kindly plant
> > the idea...

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Vaibhav Puranik <vp...@gmail.com>.

Andy,

In the current form, the scripts are not useful to the community as it has
lot of our stuff in it. But I will try to see if I can come up with
something that's useful to others.
I am also thinking of writing a blog post about this on my ec2 related blog
  (http://aws-musings.com/)

Regards,
Vaibhav

On Sun, Mar 14, 2010 at 3:58 PM, Andrew Purtell <ap...@apache.org> wrote:

> Hey Vaibhav,
>
> Do you think any of your #2 would be generally useful for others and
> something we might fold into the public HBase EC2 scripts? I don't want
> to be presumptive, but let me kindly plant the idea...
>
> Best,
>
>   - Andy
>
>
>
>
> ----- Original Message ----
> From: Vaibhav Puranik <vp...@gmail.com>
> To: hbase-user@hadoop.apache.org
> Sent: Sun, March 14, 2010 10:12:16 AM
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> HUG9)
>
> Andrew,
>
> Your point about scripts is right, we have done the following two
> things to address it:
>
> 1) Bundle our own images with HBase/Hadoop versions and configurations
>
> 2) We have written some scripts. Once the cluster is booted, all we have to
> specify  what machines are slaves machines and what machine is the master.
> We have bundled config file templates in the images. The script makes a new
> config file from an template with the new internal dns names.
>
>
>
>
>

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Andrew Purtell <ap...@apache.org>.

Hey Vaibhav,

Do you think any of your #2 would be generally useful for others and
something we might fold into the public HBase EC2 scripts? I don't want
to be presumptive, but let me kindly plant the idea...

Best,

   - Andy




----- Original Message ----
From: Vaibhav Puranik <vp...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Sun, March 14, 2010 10:12:16 AM
Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Andrew,

Your point about scripts is right, we have done the following two 
things to address it:

1) Bundle our own images with HBase/Hadoop versions and configurations

2) We have written some scripts. Once the cluster is booted, all we have to
specify  what machines are slaves machines and what machine is the master.
We have bundled config file templates in the images. The script makes a new
config file from an template with the new internal dns names.

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Vaibhav Puranik <vp...@gmail.com>.

Andrew,

Your point about scripts is right, we have done the following two things to
address it:

1) Bundle our own images with HBase/Hadoop versions and configurations

2) We have written some scripts. Once the cluster is booted, all we have to
specify  what machines are  slaves machines and what machine is the master.
We have bundled config file templates in the images. The script makes a new
config file from an template with the new internal dns names.

This solution is not fully automated. It has a manual element, but the time
to launch a new cluster with the existing data is drastically reduced.

We also use these scripts to launch a QA environment (which is essentially a
copy of the production environment).

Regards,
Vaibhav

On Sun, Mar 14, 2010 at 7:49 AM, prasenjit mukherjee
<pr...@gmail.com>wrote:

> Thanks Andrew for your comments.  From the response seems  like nobody
> has tried to do this ( starting/shutting-down ec2-clusters with the
> same EBS-backed-hdfs-hbase  data ).  It also seem to require some
> automated  scripts to dynamically attach the EBS drives one to each
> slave.
>
> Do we have anything from Cloudera folks in this regard yet ?
>
> -Prasen
>
> On Sun, Mar 14, 2010 at 2:47 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> > I thought about this some more and remembered we substitute the internal
> > DNS names allocated by EC2 into the config files, not IP addresses (and
> > that the EC2 internal DNS names embed what looks like a MAC address not
> an
> > IP address). So as long as the internal DNS name on an instance is stable
> > through suspension and resumption, the only pre-suspend and post-resume
> > steps necessary is graceful shutdown of the daemons and subsequent
> > relaunch, respectively.
> >
> > Graceful shutdown of the daemons prior to suspension will be necessary
> due
> > to how Hadoop and HBase services monitor their internal function and
> > trigger recovery actions.
> >
> > Maybe someone can experiment, confirm or refute, and then share their
> > experiences?
> >
> >   - Andy
> >
> >
> >
> > ----- Original Message ----
> >> From: Andrew Purtell <ap...@apache.org>
> >> To: hbase-user@hadoop.apache.org
> >> Sent: Sat, March 13, 2010 12:32:22 PM
> >> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> HUG9)
> >>
> >> The data will be intact, but the config will be invalidated, right?
> >
> >
> >
> >
> >
>

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by prasenjit mukherjee <pr...@gmail.com>.

Thanks Andrew for your comments.  From the response seems  like nobody
has tried to do this ( starting/shutting-down ec2-clusters with the
same EBS-backed-hdfs-hbase  data ).  It also seem to require some
automated  scripts to dynamically attach the EBS drives one to each
slave.

Do we have anything from Cloudera folks in this regard yet ?

-Prasen

On Sun, Mar 14, 2010 at 2:47 AM, Andrew Purtell <ap...@apache.org> wrote:
> I thought about this some more and remembered we substitute the internal
> DNS names allocated by EC2 into the config files, not IP addresses (and
> that the EC2 internal DNS names embed what looks like a MAC address not an
> IP address). So as long as the internal DNS name on an instance is stable
> through suspension and resumption, the only pre-suspend and post-resume
> steps necessary is graceful shutdown of the daemons and subsequent
> relaunch, respectively.
>
> Graceful shutdown of the daemons prior to suspension will be necessary due
> to how Hadoop and HBase services monitor their internal function and
> trigger recovery actions.
>
> Maybe someone can experiment, confirm or refute, and then share their
> experiences?
>
>   - Andy
>
>
>
> ----- Original Message ----
>> From: Andrew Purtell <ap...@apache.org>
>> To: hbase-user@hadoop.apache.org
>> Sent: Sat, March 13, 2010 12:32:22 PM
>> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
>>
>> The data will be intact, but the config will be invalidated, right?
>
>
>
>
>

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Andrew Purtell <ap...@apache.org>.

I thought about this some more and remembered we substitute the internal
DNS names allocated by EC2 into the config files, not IP addresses (and
that the EC2 internal DNS names embed what looks like a MAC address not an
IP address). So as long as the internal DNS name on an instance is stable
through suspension and resumption, the only pre-suspend and post-resume
steps necessary is graceful shutdown of the daemons and subsequent
relaunch, respectively. 

Graceful shutdown of the daemons prior to suspension will be necessary due
to how Hadoop and HBase services monitor their internal function and
trigger recovery actions.

Maybe someone can experiment, confirm or refute, and then share their
experiences? 

   - Andy



----- Original Message ----
> From: Andrew Purtell <ap...@apache.org>
> To: hbase-user@hadoop.apache.org
> Sent: Sat, March 13, 2010 12:32:22 PM
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)
> 
> The data will be intact, but the config will be invalidated, right?

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Andrew Purtell <ap...@apache.org>.

The data will be intact, but the config will be invalidated, right? 

After a cluster has been suspended and then resumed, all of the assigned IP
addresses will be different. So this would render all of the Hadoop and
HBase configuration files invalid. The data will be there but you will have
to go fix up all of the config files on all of your instances, somehow
accounting for which is master, which is slave, which is zookeeper. Elastic
IPs might help, but I wouldn't use them because while instance-to-instance
data transfers are free, that is NOT the case when elastic IPs are used for
internal traffic. 


This could be automated. You can track the roles of the instances locally,
make a local "suspend script" which shuts Hadoop and HBase down before you
suspend the cluster, and make a local "resume script" which remembers the
role of each instance, logs on to the instance after it has been
reactivated, performs the appropriate substitutions on Hadoop and HBase
config files, and then restarts the daemons. 

Taking this further:

HBase is almost free of static configuration: The master and the slaves 
need to know the network locations of the ZooKeeper quorum ensemble peers.
The master needs to know the network location of the HDFS NameNode. At some
future time if an option for Hadoop configuration hosting in ZooKeeper is
developed, then the HBase master could learn the address of the NameNode
from ZK. Presumably the HDFS DataNodes would do the same, and so the only
static detail for everything would be the network location of the ZK
ensemble peers. At this point you could write them as DNS hostnames and
then dynamically update DNS instead of performing a bunch of fixups on
config files. 

    - Andy


----- Original Message ----
From: Jonathan Gray <jl...@streamy.com>
To: hbase-user@hadoop.apache.org
Sent: Sat, March 13, 2010 10:13:08 AM
Subject: RE: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Prasen,

You could definitely do something like that.  As long as you keep everything
for your Hadoop/HBase setup to use EBS volumes, you should be able to spin
the cluster down, turn off the nodes, and then bring them back up at a later
time with all the data still intact.

JG

> 
> -----Original Message-----
> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
> > I agree that running 24/7 hbase servers on ec2 is not advisable.  But I
> > need some suggestions for running mapred-jobs ( in  batches ) followed
> > by updating the results on an existing hbase server.

RE: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Jonathan Gray <jl...@streamy.com>.

Prasen,

You could definitely do something like that.  As long as you keep everything
for your Hadoop/HBase setup to use EBS volumes, you should be able to spin
the cluster down, turn off the nodes, and then bring them back up at a later
time with all the data still intact.

JG

> -----Original Message-----
> From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
> Sent: Saturday, March 13, 2010 8:42 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> HUG9)
> 
> On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
> <pr...@gmail.com> wrote:
> > I agree that running 24/7 hbase servers on ec2 is not advisable. But
> I
> > need some suggestions for running mapred-jobs ( in batches ) followed
> > by updating the results on an existing hbase server.
> >
> > Is it advisable to use EBS drives ( attached to each different  slave
> > )  and have them configured as HDSF Storage Directory ?  And then use
> > hbase on top of it. I am assuming that ec2 clusters can be shutdown
> > and restarted ( at a later point of time ) to use the same hbase.
> >
> > -Prasen
> >
> > On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <ap...@apache.org>
> wrote:
> >> During the Q&A period after my presentation at HUG9, it was
> interesting that some in the audience indicated they are running
> production Hadoop and/or HBase clusters on EC2. I want to follow up on
> some comments I made there.
> >>
> >> This is a little surprising, because currently the HDFS NameNode is
> a single point of failure which can bring the whole service
> >> down. That the NameNode is a SPOF is not quite so large a concern if
> you have the ability to engineer the particular server hosting the
> NameNode to be especially reliable. However, when
> >> architecting services on EC2, you must be mindful of its guarantees,
> or lack thereof. On EC2 the reliability of any given instance is not
> guaranteed, only the service in the aggregate.
> >>
> >> Running
> >> Hadoop on top of EC2 in production is thus not advised until there
> is a good hot
> >> fail over solution for the NameNode.
> >>
> >> AWS offers a form of hosted Hadoop called Elastic MapReduce:
> http://aws.amazon.com/elasticmapreduce/. Note this service treats the
> Hadoop/HDFS cluster as a transient unreliable construction. So should
> you.
> >>
> >> Regarding a hot fail over solution for the NameNode, there is some
> really interesting work ongoing at the moment -- "AvatarNode", possibly
> with inclusion of "BookKeeper" in the architecture.
> >>
> >>
> >>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-
> availability.html
> >>
> >>
> >>    http://issues.apache.org/jira/browse/HDFS-976
> >>
> >>    http://issues.apache.org/jira/browse/HDFS-234
> >>
>  http://issues.apache.org/jira/secure/attachment/12399656/create.png
> >>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
> >> Once something like the above is vetted and tested, of course my
> above advice changes and it would become possible to architect reliable
> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
> >>
> >> In the meantime, EC2 and similar IaaS clouds are a great resource
> for prototyping, research and development, and hosting ephemeral
> clusters for QA or end to end system tests. The HBase EC2 scripts are a
> useful tool for doing such things with relative ease.
> >>
> >> Best regards,
> >>
> >>   - Andy
> >>
> >>
> >>
> >> ----- Original Message ----
> >> From: Jonathan Gray
> >> To: hbase-user@hadoop.apache.org
> >> Sent: Thu, March 11, 2010 3:01:22 PM
> >> Subject: RE: [databasepro-48] HUG9
> >>
> >> Pardon the link vomit, hopefully this comes across okay...
> >>
> >>
> >> HBase Project Update by Jonathan Gray
> >>
> >>
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> e&do=
> >> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
> >>
> >>
> >> HBase and HDFS by Todd Lipcon of Cloudera
> >>
> >>
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> e&do=
> >> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
> >>
> >>
> >> HBase on EC2 by Andrew Purtell of Trend Micro
> >>
> >> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
> >>
> >>
> >>
> >>
> >>
> >
> 
> I have not used EC2 extensively but some of the things you can do are
> very impressive in terms of spin up.
> 
> As a sys-admin and a guy who worked at a data center, I would suggest
> to shop around. Do not fall in love with EC2 because its hip. I you
> are short on cash. You can get 6 dedicated services for $375.00 USD
> Per Month
> http://www.leeware.com/services.html. (I use leeware for some hosting)
> That is a big difference 6 servers for 375 vs 1 VM for $500.
> 
> I am not saying use service X or service Y, but I do not see much
> value. If you have a small strong ops team with
> kickstart+(puppet/CFegnine) you can get that "fast spin up" magic.

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Edward Capriolo <ed...@gmail.com>.

On Sat, Mar 13, 2010 at 7:36 AM, prasenjit mukherjee
<pr...@gmail.com> wrote:
> I agree that running 24/7 hbase servers on ec2 is not advisable. But I
> need some suggestions for running mapred-jobs ( in batches ) followed
> by updating the results on an existing hbase server.
>
> Is it advisable to use EBS drives ( attached to each different  slave
> )  and have them configured as HDSF Storage Directory ?  And then use
> hbase on top of it. I am assuming that ec2 clusters can be shutdown
> and restarted ( at a later point of time ) to use the same hbase.
>
> -Prasen
>
> On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <ap...@apache.org> wrote:
>> During the Q&A period after my presentation at HUG9, it was interesting that some in the audience indicated they are running production Hadoop and/or HBase clusters on EC2. I want to follow up on some comments I made there.
>>
>> This is a little surprising, because currently the HDFS NameNode is a single point of failure which can bring the whole service
>> down. That the NameNode is a SPOF is not quite so large a concern if you have the ability to engineer the particular server hosting the NameNode to be especially reliable. However, when
>> architecting services on EC2, you must be mindful of its guarantees, or lack thereof. On EC2 the reliability of any given instance is not guaranteed, only the service in the aggregate.
>>
>> Running
>> Hadoop on top of EC2 in production is thus not advised until there is a good hot
>> fail over solution for the NameNode.
>>
>> AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/. Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So should you.
>>
>> Regarding a hot fail over solution for the NameNode, there is some really interesting work ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the architecture.
>>
>>
>>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
>>
>>
>>    http://issues.apache.org/jira/browse/HDFS-976
>>
>>    http://issues.apache.org/jira/browse/HDFS-234
>>        http://issues.apache.org/jira/secure/attachment/12399656/create.png
>>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
>> Once something like the above is vetted and tested, of course my above advice changes and it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
>>
>> In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase EC2 scripts are a useful tool for doing such things with relative ease.
>>
>> Best regards,
>>
>>   - Andy
>>
>>
>>
>> ----- Original Message ----
>> From: Jonathan Gray
>> To: hbase-user@hadoop.apache.org
>> Sent: Thu, March 11, 2010 3:01:22 PM
>> Subject: RE: [databasepro-48] HUG9
>>
>> Pardon the link vomit, hopefully this comes across okay...
>>
>>
>> HBase Project Update by Jonathan Gray
>>
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
>> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
>>
>>
>> HBase and HDFS by Todd Lipcon of Cloudera
>>
>> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
>> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
>>
>>
>> HBase on EC2 by Andrew Purtell of Trend Micro
>>
>> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>>
>>
>>
>>
>>
>

I have not used EC2 extensively but some of the things you can do are
very impressive in terms of spin up.

As a sys-admin and a guy who worked at a data center, I would suggest
to shop around. Do not fall in love with EC2 because its hip. I you
are short on cash. You can get 6 dedicated services for $375.00 USD
Per Month
http://www.leeware.com/services.html. (I use leeware for some hosting)
That is a big difference 6 servers for 375 vs 1 VM for $500.

I am not saying use service X or service Y, but I do not see much
value. If you have a small strong ops team with
kickstart+(puppet/CFegnine) you can get that "fast spin up" magic.

Re: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by prasenjit mukherjee <pr...@gmail.com>.

I agree that running 24/7 hbase servers on ec2 is not advisable. But I
need some suggestions for running mapred-jobs ( in batches ) followed
by updating the results on an existing hbase server.

Is it advisable to use EBS drives ( attached to each different  slave
)  and have them configured as HDSF Storage Directory ?  And then use
hbase on top of it. I am assuming that ec2 clusters can be shutdown
and restarted ( at a later point of time ) to use the same hbase.

-Prasen

On Sat, Mar 13, 2010 at 1:56 AM, Andrew Purtell <ap...@apache.org> wrote:
> During the Q&A period after my presentation at HUG9, it was interesting that some in the audience indicated they are running production Hadoop and/or HBase clusters on EC2. I want to follow up on some comments I made there.
>
> This is a little surprising, because currently the HDFS NameNode is a single point of failure which can bring the whole service
> down. That the NameNode is a SPOF is not quite so large a concern if you have the ability to engineer the particular server hosting the NameNode to be especially reliable. However, when
> architecting services on EC2, you must be mindful of its guarantees, or lack thereof. On EC2 the reliability of any given instance is not guaranteed, only the service in the aggregate.
>
> Running
> Hadoop on top of EC2 in production is thus not advised until there is a good hot
> fail over solution for the NameNode.
>
> AWS offers a form of hosted Hadoop called Elastic MapReduce: http://aws.amazon.com/elasticmapreduce/. Note this service treats the Hadoop/HDFS cluster as a transient unreliable construction. So should you.
>
> Regarding a hot fail over solution for the NameNode, there is some really interesting work ongoing at the moment -- "AvatarNode", possibly with inclusion of "BookKeeper" in the architecture.
>
>
>    http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-availability.html
>
>
>    http://issues.apache.org/jira/browse/HDFS-976
>
>    http://issues.apache.org/jira/browse/HDFS-234
>        http://issues.apache.org/jira/secure/attachment/12399656/create.png
>        https://issues.apache.org/jira/browse/ZOOKEEPER-276
> Once something like the above is vetted and tested, of course my above advice changes and it would become possible to architect reliable Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
>
> In the meantime, EC2 and similar IaaS clouds are a great resource for prototyping, research and development, and hosting ephemeral clusters for QA or end to end system tests. The HBase EC2 scripts are a useful tool for doing such things with relative ease.
>
> Best regards,
>
>   - Andy
>
>
>
> ----- Original Message ----
> From: Jonathan Gray
> To: hbase-user@hadoop.apache.org
> Sent: Thu, March 11, 2010 3:01:22 PM
> Subject: RE: [databasepro-48] HUG9
>
> Pardon the link vomit, hopefully this comes across okay...
>
>
> HBase Project Update by Jonathan Gray
>
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
>
>
> HBase and HDFS by Todd Lipcon of Cloudera
>
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFile&do=
> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
>
>
> HBase on EC2 by Andrew Purtell of Trend Micro
>
> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
>
>
>
>
>

RE: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48] HUG9)

Posted by Jonathan Gray <jl...@streamy.com>.

>From a boots on the ground perspective (after consulting with bunches of
small companies around hadoop and hbase), there are a TON of people running
in production on EC2.  By far more people doing that than buying their own
hardware.

I strongly advise them against it in almost all cases but it tends not to
matter so much.  They want to get to market as fast as possible, they have
no sys admins, and they want to keep costs low in the short-term.  They're
also willing to take the risk.

I agree it's not smart and Streamy never went that route for your reasons
and cost reasons.

It's awesome for elastic applications, but makes far less sense for 24/7
production clusters.

Startups tend to think one month at a time and hardware outlays are costly
in time and money, in the short-term.


One thing we could do to help educate people better, besides what you did at
the HUG which was really valuable, would be to do some cost analysis.  That
kind of stuff is hard to get done in an OSS project but could help deter
people from thinking EC2 is smart because it will save them money.

When you tell them that a single node failure will kill the cluster for some
time and could even kill the data, they tend to think it won't happen to
them or that they will cross that bridge when it comes.

JG

> -----Original Message-----
> From: Andrew Purtell [mailto:apurtell@apache.org]
> Sent: Friday, March 12, 2010 12:27 PM
> To: hbase-user@hadoop.apache.org
> Subject: on Hadoop reliability wrt. EC2 (was: Re: [databasepro-48]
> HUG9)
> 
> During the Q&A period after my presentation at HUG9, it was interesting
> that some in the audience indicated they are running production Hadoop
> and/or HBase clusters on EC2. I want to follow up on some comments I
> made there.
> 
> This is a little surprising, because currently the HDFS NameNode is a
> single point of failure which can bring the whole service
> down. That the NameNode is a SPOF is not quite so large a concern if
> you have the ability to engineer the particular server hosting the
> NameNode to be especially reliable. However, when
> architecting services on EC2, you must be mindful of its guarantees, or
> lack thereof. On EC2 the reliability of any given instance is not
> guaranteed, only the service in the aggregate.
> 
> Running
> Hadoop on top of EC2 in production is thus not advised until there is a
> good hot
> fail over solution for the NameNode.
> 
> AWS offers a form of hosted Hadoop called Elastic MapReduce:
> http://aws.amazon.com/elasticmapreduce/. Note this service treats the
> Hadoop/HDFS cluster as a transient unreliable construction. So should
> you.
> 
> Regarding a hot fail over solution for the NameNode, there is some
> really interesting work ongoing at the moment -- "AvatarNode", possibly
> with inclusion of "BookKeeper" in the architecture.
> 
> 
>     http://hadoopblog.blogspot.com/2010/02/hadoop-namenode-high-
> availability.html
> 
> 
>     http://issues.apache.org/jira/browse/HDFS-976
> 
>     http://issues.apache.org/jira/browse/HDFS-234
> 
> http://issues.apache.org/jira/secure/attachment/12399656/create.png
>         https://issues.apache.org/jira/browse/ZOOKEEPER-276
> Once something like the above is vetted and tested, of course my above
> advice changes and it would become possible to architect reliable
> Hadoop/HBase clusters on top of EC2 and similar IaaS clouds.
> 
> In the meantime, EC2 and similar IaaS clouds are a great resource for
> prototyping, research and development, and hosting ephemeral clusters
> for QA or end to end system tests. The HBase EC2 scripts are a useful
> tool for doing such things with relative ease.
> 
> Best regards,
> 
>    - Andy
> 
> 
> 
> ----- Original Message ----
> From: Jonathan Gray
> To: hbase-user@hadoop.apache.org
> Sent: Thu, March 11, 2010 3:01:22 PM
> Subject: RE: [databasepro-48] HUG9
> 
> Pardon the link vomit, hopefully this comes across okay...
> 
> 
> HBase Project Update by Jonathan Gray
> 
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> e&do=
> get&target=HUG9_HBaseUpdate_JonathanGray.pdf
> 
> 
> HBase and HDFS by Todd Lipcon of Cloudera
> 
> http://wiki.apache.org/hadoop/HBase/HBasePresentations?action=AttachFil
> e&do=
> get&target=HUG9_HBaseAndHDFS_ToddLipcon_Cloudera.pdf
> 
> 
> HBase on EC2 by Andrew Purtell of Trend Micro
> 
> http://hbase.s3.amazonaws.com/hbase/HBase-EC2-HUG9.pdf
> 
> 
>