You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by David Van Couvering <da...@vancouvering.com> on 2009/02/26 19:17:51 UTC

HBase and failure notification

Hey, all.  I'm doing a bit of a survey of distributed key/value stores out
there.  HBase looks pretty interesting, nice to see an open source version
of BigTable out there.

HBase is obviously clustered, but what I can't figure out is how it does
cluster management.  It looks like you have to configure it to tell it all
the machines that have region servers, and that implies to me that *you*
have to start and manage the region servers - HBase doesn't do any of that
for you.  So I think that means that it doesn't have any node monitoring
support - you have to have your own monitoring system that detects failed
nodes and notifies you and/or restarts them for you.

Also, the architecture document says "if [the master server] detects a
HRegionServer is no longer reachable, it will split the HRegionServer's
write-ahead log so that there is now one write-ahead log for each region
that the HRegionServer was serving. After it has accomplished this, it will
reassign the regions that were being served by the unreachable
HRegionServer"

This seems to imply that even though the HRegionServer is unreachable,
somehow it's write-ahead log and the regions it was serving are.  Perhaps I
don't fully understand HFS, but is this a guarantee when the node hosting
the HRegionServer is down?  What happens if you can't get to the write-ahead
log and/or some of the regions the region server was serving?

Thanks,

David

-- 
David W. Van Couvering

I am looking for a senior position working on server-side Java systems.
 Feel free to contact me if you know of any opportunities.

http://www.linkedin.com/in/davidvc
http://davidvancouvering.blogspot.com
http://twitter.com/dcouvering


-- 
David W. Van Couvering

I am looking for a senior position working on server-side Java systems.
 Feel free to contact me if you know of any opportunities.

http://www.linkedin.com/in/davidvc
http://davidvancouvering.blogspot.com
http://twitter.com/dcouvering

Re: HBase and failure notification

Posted by David Van Couvering <da...@gmail.com>.
Thanks for the answesr, St. Ack.  That name is very very familiar, and I am
married to a woman named Linda.  Look for me on Facebook :)

I'll look into HDFS to understand the failure semantics for things like
network partitions, etc.

David

On Thu, Feb 26, 2009 at 10:54 AM, stack <st...@duboce.net> wrote:

> On Thu, Feb 26, 2009 at 10:17 AM, David Van Couvering <
> david@vancouvering.com> wrote:
>
> >
> > HBase is obviously clustered, but what I can't figure out is how it does
> > cluster management.  It looks like you have to configure it to tell it
> all
> > the machines that have region servers, and that implies to me that *you*
> > have to start and manage the region servers - HBase doesn't do any of
> that
> > for you.  So I think that means that it doesn't have any node monitoring
> > support - you have to have your own monitoring system that detects failed
> > nodes and notifies you and/or restarts them for you.
> >
>
>
> It'll start them all for you.  If one dies, it deals reallocating the
> downed
> servers regions.  It doesn't call the data center to schedule the disk
> replacement for you (smile).
>
>
>
> > Also, the architecture document says "if [the master server] detects a
> > HRegionServer is no longer reachable, it will split the HRegionServer's
> > write-ahead log so that there is now one write-ahead log for each region
> > that the HRegionServer was serving. After it has accomplished this, it
> will
> > reassign the regions that were being served by the unreachable
> > HRegionServer"
> >
> > This seems to imply that even though the HRegionServer is unreachable,
> > somehow it's write-ahead log and the regions it was serving are.  Perhaps
> I
> > don't fully understand HFS, but is this a guarantee when the node hosting
> > the HRegionServer is down?  What happens if you can't get to the
> > write-ahead
> > log and/or some of the regions the region server was serving?
>
>
> Its log is written into the HDFS, a distributed file system that by default
> replicates all that is written to it.  A member of the HDFS cluster might
> go
> down and take some data with it but because the data is replicated, when
> the
> commit log is replayed, it'll be using one of the still online replicas.
>
> (Do you know a woman named Linda?)
>
> St.Ack
>



-- 
David W. Van Couvering

I am looking for a senior position working on server-side Java systems.
 Feel free to contact me if you know of any opportunities.

http://www.linkedin.com/in/davidvc
http://davidvancouvering.blogspot.com
http://twitter.com/dcouvering

Re: HBase and failure notification

Posted by stack <st...@duboce.net>.
On Thu, Feb 26, 2009 at 10:17 AM, David Van Couvering <
david@vancouvering.com> wrote:

>
> HBase is obviously clustered, but what I can't figure out is how it does
> cluster management.  It looks like you have to configure it to tell it all
> the machines that have region servers, and that implies to me that *you*
> have to start and manage the region servers - HBase doesn't do any of that
> for you.  So I think that means that it doesn't have any node monitoring
> support - you have to have your own monitoring system that detects failed
> nodes and notifies you and/or restarts them for you.
>


It'll start them all for you.  If one dies, it deals reallocating the downed
servers regions.  It doesn't call the data center to schedule the disk
replacement for you (smile).



> Also, the architecture document says "if [the master server] detects a
> HRegionServer is no longer reachable, it will split the HRegionServer's
> write-ahead log so that there is now one write-ahead log for each region
> that the HRegionServer was serving. After it has accomplished this, it will
> reassign the regions that were being served by the unreachable
> HRegionServer"
>
> This seems to imply that even though the HRegionServer is unreachable,
> somehow it's write-ahead log and the regions it was serving are.  Perhaps I
> don't fully understand HFS, but is this a guarantee when the node hosting
> the HRegionServer is down?  What happens if you can't get to the
> write-ahead
> log and/or some of the regions the region server was serving?


Its log is written into the HDFS, a distributed file system that by default
replicates all that is written to it.  A member of the HDFS cluster might go
down and take some data with it but because the data is replicated, when the
commit log is replayed, it'll be using one of the still online replicas.

(Do you know a woman named Linda?)

St.Ack

Re: HBase and failure notification

Posted by David Van Couvering <da...@gmail.com>.
Cool stuff, thanks!

David

On Thu, Feb 26, 2009 at 11:13 AM, Jim Kellerman (POWERSET) <
Jim.Kellerman@microsoft.com> wrote:

> > -----Original Message-----
> > From: david.vancouvering@gmail.com [mailto:david.vancouvering@gmail.com]
> > On Behalf Of David Van Couvering
> > Sent: Thursday, February 26, 2009 10:18 AM
> > To: hbase-user@hadoop.apache.org
> > Subject: HBase and failure notification
> >
> > Hey, all.  I'm doing a bit of a survey of distributed key/value stores
> > out
> > there.  HBase looks pretty interesting, nice to see an open source
> > version
> > of BigTable out there.
> >
> > HBase is obviously clustered, but what I can't figure out is how it does
> > cluster management.  It looks like you have to configure it to tell it
> > all
> > the machines that have region servers, and that implies to me that *you*
> > have to start and manage the region servers - HBase doesn't do any of
> > that
> > for you.
>
> There are start and stop scripts that will start up the master and region
> servers.
>
> > So I think that means that it doesn't have any node monitoring
> > support - you have to have your own monitoring system that detects
> > failed nodes and notifies you and/or restarts them for you.
>
> HBase has a web UI that you can use to monitor the state of the cluster.
> The master does detect when a region server becomes unreachable.
>
> But if you mean machine failure, HBase does not have built in monitoring,
> but you can use Ganglia to monitor the hardware status. HBase can also
> feed metrics to Ganglia.
>
> >
> > Also, the architecture document says "if [the master server] detects a
> > HRegionServer is no longer reachable, it will split the HRegionServer's
> > write-ahead log so that there is now one write-ahead log for each region
> > that the HRegionServer was serving. After it has accomplished this, it
> > will
> > reassign the regions that were being served by the unreachable
> > HRegionServer"
> >
> > This seems to imply that even though the HRegionServer is unreachable,
> > somehow it's write-ahead log and the regions it was serving are.
> > Perhaps I
> > don't fully understand HFS, but is this a guarantee when the node
> > hosting
> > the HRegionServer is down?  What happens if you can't get to the write-
> > ahead
> > log and/or some of the regions the region server was serving?
>
> HDFS replicates data to multiple machines (3 by default), so unless you
> have a catastrophic outage, it is very unlikely that the data will be
> completely unreachable.
>
> > Thanks,
> >
> > David
> >
> > --
> > David W. Van Couvering
> >
> > I am looking for a senior position working on server-side Java systems.
> >  Feel free to contact me if you know of any opportunities.
> >
> > http://www.linkedin.com/in/davidvc
> > http://davidvancouvering.blogspot.com
> > http://twitter.com/dcouvering
> >
> >
> > --
> > David W. Van Couvering
> >
> > I am looking for a senior position working on server-side Java systems.
> >  Feel free to contact me if you know of any opportunities.
> >
> > http://www.linkedin.com/in/davidvc
> > http://davidvancouvering.blogspot.com
> > http://twitter.com/dcouvering
>



-- 
David W. Van Couvering

I am looking for a senior position working on server-side Java systems.
 Feel free to contact me if you know of any opportunities.

http://www.linkedin.com/in/davidvc
http://davidvancouvering.blogspot.com
http://twitter.com/dcouvering

RE: HBase and failure notification

Posted by "Jim Kellerman (POWERSET)" <Ji...@microsoft.com>.
> -----Original Message-----
> From: david.vancouvering@gmail.com [mailto:david.vancouvering@gmail.com]
> On Behalf Of David Van Couvering
> Sent: Thursday, February 26, 2009 10:18 AM
> To: hbase-user@hadoop.apache.org
> Subject: HBase and failure notification
> 
> Hey, all.  I'm doing a bit of a survey of distributed key/value stores
> out
> there.  HBase looks pretty interesting, nice to see an open source
> version
> of BigTable out there.
> 
> HBase is obviously clustered, but what I can't figure out is how it does
> cluster management.  It looks like you have to configure it to tell it
> all
> the machines that have region servers, and that implies to me that *you*
> have to start and manage the region servers - HBase doesn't do any of
> that
> for you.

There are start and stop scripts that will start up the master and region
servers.

> So I think that means that it doesn't have any node monitoring
> support - you have to have your own monitoring system that detects
> failed nodes and notifies you and/or restarts them for you.

HBase has a web UI that you can use to monitor the state of the cluster.
The master does detect when a region server becomes unreachable.

But if you mean machine failure, HBase does not have built in monitoring,
but you can use Ganglia to monitor the hardware status. HBase can also
feed metrics to Ganglia.

> 
> Also, the architecture document says "if [the master server] detects a
> HRegionServer is no longer reachable, it will split the HRegionServer's
> write-ahead log so that there is now one write-ahead log for each region
> that the HRegionServer was serving. After it has accomplished this, it
> will
> reassign the regions that were being served by the unreachable
> HRegionServer"
> 
> This seems to imply that even though the HRegionServer is unreachable,
> somehow it's write-ahead log and the regions it was serving are.
> Perhaps I
> don't fully understand HFS, but is this a guarantee when the node
> hosting
> the HRegionServer is down?  What happens if you can't get to the write-
> ahead
> log and/or some of the regions the region server was serving?

HDFS replicates data to multiple machines (3 by default), so unless you 
have a catastrophic outage, it is very unlikely that the data will be
completely unreachable.

> Thanks,
> 
> David
> 
> --
> David W. Van Couvering
> 
> I am looking for a senior position working on server-side Java systems.
>  Feel free to contact me if you know of any opportunities.
> 
> http://www.linkedin.com/in/davidvc
> http://davidvancouvering.blogspot.com
> http://twitter.com/dcouvering
> 
> 
> --
> David W. Van Couvering
> 
> I am looking for a senior position working on server-side Java systems.
>  Feel free to contact me if you know of any opportunities.
> 
> http://www.linkedin.com/in/davidvc
> http://davidvancouvering.blogspot.com
> http://twitter.com/dcouvering