You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Fred Zappert <fz...@gmail.com> on 2009/06/16 22:53:08 UTC

Are there any single points of failure in a HBase configuration

Hi,

We're considering HBase for a customer-facing SaaS.  I saw some references
to Master instance, and failure/failover scenarios on this list.

We would be running this across at least two data centers in different
cities or sates.

Which leads to the following questions:

1. Are there any single points of failure in an HBase configuration.

2: What would be the impact of one data center being down?

3: What would be the recovery time and procedure to restore normal operation
on a new master?

There are approximately 4M transactions/day

Thanks,

Fred.

Re: Are there any single points of failure in a HBase configuration

Posted by Andrew Purtell <ap...@apache.org>.

Hi,

We would definitely not do 2PC for HBase replication. See HBASE-1295 for what is currently on deck. 
    https://issues.apache.org/jira/browse/HBASE-1295
That's not the final word; it's a work in progress. I've been editing the attached deck this afternoon in fact. Ryan and Jon Gray (and others) may have different/additional ideas. 

  - Andy





________________________________
From: Fred Zappert <fz...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Tuesday, June 16, 2009 2:35:05 PM
Subject: Re: Are there any single points of failure in a HBase configuration

Ryan,

Thanks for the information.

In terms of replication support, I had already recommended replication by
having the transactions processed at both data centers by a message queue
(top-level-replication) that did not require data-base level replication.

I've seen to many limitations on those over the years in the RDMS field to
want to deal with that again.

This application is amenable to that approach since most of it involves data
collection (inserts and simple updates) with no requirement for two-phase
commits that we've discovered yet.

Regards,

Fred.

On Tue, Jun 16, 2009 at 4:27 PM, Ryan Rawson <ry...@gmail.com> wrote:

> i havent heard of it, but it would be nice :-)
>
>
>
> On Tue, Jun 16, 2009 at 2:21 PM, Ski Gh3 <sk...@gmail.com> wrote:
>
> > I thought HDFS will fix the namenode as a SPOF just as HBase fix the
> master
> > in 0.20, so that is still not there yet?
> >
> > On Tue, Jun 16, 2009 at 2:04 PM, Ryan Rawson <ry...@gmail.com> wrote:
> >
> > > HBase itself doesn't strictly have any SPOF in 0.20.  Multiple master
> > > failover, etc.
> > >
> > > HBase depends on HDFS, which does have a SPOF in it's namenode.  If
> that
> > > goes down, everything is down.  Generally speaking, namenode is
> reliable,
> > > but the hardware is the issue.  You can have a quick recovery, but
> still
> > > outage.
> > >
> > >
> > > HBase isn't explicitly designed to run across a WAN split between 2
> > > datacenters.  It's certainly possible, but during certain link-down
> > > scenarios you are looking at cluster splits.  HBase master will decide
> > > those
> > > regionservers no longer reachable have died, and HDFS will assume those
> > > datanodes lost are gone, and start to replicate data.
> > >
> > > In HBase 0.21, we are hoping to have replication support between
> > clusters.
> > >
> > > -ryan
> > >
> > >
> > > On Tue, Jun 16, 2009 at 1:53 PM, Fred Zappert <fz...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We're considering HBase for a customer-facing SaaS.  I saw some
> > > references
> > > > to Master instance, and failure/failover scenarios on this list.
> > > >
> > > > We would be running this across at least two data centers in
> different
> > > > cities or sates.
> > > >
> > > > Which leads to the following questions:
> > > >
> > > > 1. Are there any single points of failure in an HBase configuration.
> > > >
> > > > 2: What would be the impact of one data center being down?
> > > >
> > > > 3: What would be the recovery time and procedure to restore normal
> > > > operation
> > > > on a new master?
> > > >
> > > > There are approximately 4M transactions/day
> > > >
> > > > Thanks,
> > > >
> > > > Fred.
> > > >
> > >
> >
>

Re: Are there any single points of failure in a HBase configuration

Posted by Ryan Rawson <ry...@gmail.com>.

Any replication support in 0.21 won't be even close to 2PC, since it's more
or less a nightmare.

But your approach sounds good in the mean time.

Good luck!
-ryan

On Tue, Jun 16, 2009 at 2:35 PM, Fred Zappert <fz...@gmail.com> wrote:

> Ryan,
>
> Thanks for the information.
>
> In terms of replication support, I had already recommended replication by
> having the transactions processed at both data centers by a message queue
> (top-level-replication) that did not require data-base level replication.
>
> I've seen to many limitations on those over the years in the RDMS field to
> want to deal with that again.
>
> This application is amenable to that approach since most of it involves
> data
> collection (inserts and simple updates) with no requirement for two-phase
> commits that we've discovered yet.
>
> Regards,
>
> Fred.
>
> On Tue, Jun 16, 2009 at 4:27 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
> > i havent heard of it, but it would be nice :-)
> >
> >
> >
> > On Tue, Jun 16, 2009 at 2:21 PM, Ski Gh3 <sk...@gmail.com> wrote:
> >
> > > I thought HDFS will fix the namenode as a SPOF just as HBase fix the
> > master
> > > in 0.20, so that is still not there yet?
> > >
> > > On Tue, Jun 16, 2009 at 2:04 PM, Ryan Rawson <ry...@gmail.com>
> wrote:
> > >
> > > > HBase itself doesn't strictly have any SPOF in 0.20.  Multiple master
> > > > failover, etc.
> > > >
> > > > HBase depends on HDFS, which does have a SPOF in it's namenode.  If
> > that
> > > > goes down, everything is down.  Generally speaking, namenode is
> > reliable,
> > > > but the hardware is the issue.  You can have a quick recovery, but
> > still
> > > > outage.
> > > >
> > > >
> > > > HBase isn't explicitly designed to run across a WAN split between 2
> > > > datacenters.  It's certainly possible, but during certain link-down
> > > > scenarios you are looking at cluster splits.  HBase master will
> decide
> > > > those
> > > > regionservers no longer reachable have died, and HDFS will assume
> those
> > > > datanodes lost are gone, and start to replicate data.
> > > >
> > > > In HBase 0.21, we are hoping to have replication support between
> > > clusters.
> > > >
> > > > -ryan
> > > >
> > > >
> > > > On Tue, Jun 16, 2009 at 1:53 PM, Fred Zappert <fz...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We're considering HBase for a customer-facing SaaS.  I saw some
> > > > references
> > > > > to Master instance, and failure/failover scenarios on this list.
> > > > >
> > > > > We would be running this across at least two data centers in
> > different
> > > > > cities or sates.
> > > > >
> > > > > Which leads to the following questions:
> > > > >
> > > > > 1. Are there any single points of failure in an HBase
> configuration.
> > > > >
> > > > > 2: What would be the impact of one data center being down?
> > > > >
> > > > > 3: What would be the recovery time and procedure to restore normal
> > > > > operation
> > > > > on a new master?
> > > > >
> > > > > There are approximately 4M transactions/day
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Fred.
> > > > >
> > > >
> > >
> >
>

Re: Are there any single points of failure in a HBase configuration

Posted by Fred Zappert <fz...@gmail.com>.

Ryan,

Thanks for the information.

In terms of replication support, I had already recommended replication by
having the transactions processed at both data centers by a message queue
(top-level-replication) that did not require data-base level replication.

I've seen to many limitations on those over the years in the RDMS field to
want to deal with that again.

This application is amenable to that approach since most of it involves data
collection (inserts and simple updates) with no requirement for two-phase
commits that we've discovered yet.

Regards,

Fred.

On Tue, Jun 16, 2009 at 4:27 PM, Ryan Rawson <ry...@gmail.com> wrote:

> i havent heard of it, but it would be nice :-)
>
>
>
> On Tue, Jun 16, 2009 at 2:21 PM, Ski Gh3 <sk...@gmail.com> wrote:
>
> > I thought HDFS will fix the namenode as a SPOF just as HBase fix the
> master
> > in 0.20, so that is still not there yet?
> >
> > On Tue, Jun 16, 2009 at 2:04 PM, Ryan Rawson <ry...@gmail.com> wrote:
> >
> > > HBase itself doesn't strictly have any SPOF in 0.20.  Multiple master
> > > failover, etc.
> > >
> > > HBase depends on HDFS, which does have a SPOF in it's namenode.  If
> that
> > > goes down, everything is down.  Generally speaking, namenode is
> reliable,
> > > but the hardware is the issue.  You can have a quick recovery, but
> still
> > > outage.
> > >
> > >
> > > HBase isn't explicitly designed to run across a WAN split between 2
> > > datacenters.  It's certainly possible, but during certain link-down
> > > scenarios you are looking at cluster splits.  HBase master will decide
> > > those
> > > regionservers no longer reachable have died, and HDFS will assume those
> > > datanodes lost are gone, and start to replicate data.
> > >
> > > In HBase 0.21, we are hoping to have replication support between
> > clusters.
> > >
> > > -ryan
> > >
> > >
> > > On Tue, Jun 16, 2009 at 1:53 PM, Fred Zappert <fz...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We're considering HBase for a customer-facing SaaS.  I saw some
> > > references
> > > > to Master instance, and failure/failover scenarios on this list.
> > > >
> > > > We would be running this across at least two data centers in
> different
> > > > cities or sates.
> > > >
> > > > Which leads to the following questions:
> > > >
> > > > 1. Are there any single points of failure in an HBase configuration.
> > > >
> > > > 2: What would be the impact of one data center being down?
> > > >
> > > > 3: What would be the recovery time and procedure to restore normal
> > > > operation
> > > > on a new master?
> > > >
> > > > There are approximately 4M transactions/day
> > > >
> > > > Thanks,
> > > >
> > > > Fred.
> > > >
> > >
> >
>

Re: Are there any single points of failure in a HBase configuration

Posted by Ryan Rawson <ry...@gmail.com>.

i havent heard of it, but it would be nice :-)



On Tue, Jun 16, 2009 at 2:21 PM, Ski Gh3 <sk...@gmail.com> wrote:

> I thought HDFS will fix the namenode as a SPOF just as HBase fix the master
> in 0.20, so that is still not there yet?
>
> On Tue, Jun 16, 2009 at 2:04 PM, Ryan Rawson <ry...@gmail.com> wrote:
>
> > HBase itself doesn't strictly have any SPOF in 0.20.  Multiple master
> > failover, etc.
> >
> > HBase depends on HDFS, which does have a SPOF in it's namenode.  If that
> > goes down, everything is down.  Generally speaking, namenode is reliable,
> > but the hardware is the issue.  You can have a quick recovery, but still
> > outage.
> >
> >
> > HBase isn't explicitly designed to run across a WAN split between 2
> > datacenters.  It's certainly possible, but during certain link-down
> > scenarios you are looking at cluster splits.  HBase master will decide
> > those
> > regionservers no longer reachable have died, and HDFS will assume those
> > datanodes lost are gone, and start to replicate data.
> >
> > In HBase 0.21, we are hoping to have replication support between
> clusters.
> >
> > -ryan
> >
> >
> > On Tue, Jun 16, 2009 at 1:53 PM, Fred Zappert <fz...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > We're considering HBase for a customer-facing SaaS.  I saw some
> > references
> > > to Master instance, and failure/failover scenarios on this list.
> > >
> > > We would be running this across at least two data centers in different
> > > cities or sates.
> > >
> > > Which leads to the following questions:
> > >
> > > 1. Are there any single points of failure in an HBase configuration.
> > >
> > > 2: What would be the impact of one data center being down?
> > >
> > > 3: What would be the recovery time and procedure to restore normal
> > > operation
> > > on a new master?
> > >
> > > There are approximately 4M transactions/day
> > >
> > > Thanks,
> > >
> > > Fred.
> > >
> >
>

Re: Are there any single points of failure in a HBase configuration

Posted by Ski Gh3 <sk...@gmail.com>.

I thought HDFS will fix the namenode as a SPOF just as HBase fix the master
in 0.20, so that is still not there yet?

On Tue, Jun 16, 2009 at 2:04 PM, Ryan Rawson <ry...@gmail.com> wrote:

> HBase itself doesn't strictly have any SPOF in 0.20.  Multiple master
> failover, etc.
>
> HBase depends on HDFS, which does have a SPOF in it's namenode.  If that
> goes down, everything is down.  Generally speaking, namenode is reliable,
> but the hardware is the issue.  You can have a quick recovery, but still
> outage.
>
>
> HBase isn't explicitly designed to run across a WAN split between 2
> datacenters.  It's certainly possible, but during certain link-down
> scenarios you are looking at cluster splits.  HBase master will decide
> those
> regionservers no longer reachable have died, and HDFS will assume those
> datanodes lost are gone, and start to replicate data.
>
> In HBase 0.21, we are hoping to have replication support between clusters.
>
> -ryan
>
>
> On Tue, Jun 16, 2009 at 1:53 PM, Fred Zappert <fz...@gmail.com> wrote:
>
> > Hi,
> >
> > We're considering HBase for a customer-facing SaaS.  I saw some
> references
> > to Master instance, and failure/failover scenarios on this list.
> >
> > We would be running this across at least two data centers in different
> > cities or sates.
> >
> > Which leads to the following questions:
> >
> > 1. Are there any single points of failure in an HBase configuration.
> >
> > 2: What would be the impact of one data center being down?
> >
> > 3: What would be the recovery time and procedure to restore normal
> > operation
> > on a new master?
> >
> > There are approximately 4M transactions/day
> >
> > Thanks,
> >
> > Fred.
> >
>

Re: Are there any single points of failure in a HBase configuration

Posted by Ryan Rawson <ry...@gmail.com>.

HBase itself doesn't strictly have any SPOF in 0.20.  Multiple master
failover, etc.

HBase depends on HDFS, which does have a SPOF in it's namenode.  If that
goes down, everything is down.  Generally speaking, namenode is reliable,
but the hardware is the issue.  You can have a quick recovery, but still
outage.

HBase isn't explicitly designed to run across a WAN split between 2
datacenters.  It's certainly possible, but during certain link-down
scenarios you are looking at cluster splits.  HBase master will decide those
regionservers no longer reachable have died, and HDFS will assume those
datanodes lost are gone, and start to replicate data.

In HBase 0.21, we are hoping to have replication support between clusters.

-ryan

On Tue, Jun 16, 2009 at 1:53 PM, Fred Zappert <fz...@gmail.com> wrote:

> Hi,
>
> We're considering HBase for a customer-facing SaaS.  I saw some references
> to Master instance, and failure/failover scenarios on this list.
>
> We would be running this across at least two data centers in different
> cities or sates.
>
> Which leads to the following questions:
>
> 1. Are there any single points of failure in an HBase configuration.
>
> 2: What would be the impact of one data center being down?
>
> 3: What would be the recovery time and procedure to restore normal
> operation
> on a new master?
>
> There are approximately 4M transactions/day
>
> Thanks,
>
> Fred.
>