You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Joseph Coleman <jo...@infinitecampus.com> on 2011/02/07 20:04:51 UTC

Master node Question

I am in the process of building out a clustered environment that is going to start very small but grow extremely fast. I am just trying to figure out the following:

1. How many master nodes can I have?

2. If the answers is only 1 what happens if the master fails or dies. Does a slave take over and if so is the slave just referenced in the master files?

Thanks for your help I am just trying to understand this before I order my equipment. The build out will be a couple master nodes for redundancy, 10 data nodes running HDFS AND HBASE and a 2 to 3 node zookeeper cluster. We anticipate that out cluster will grow to 80 nodes before years end which is why I am thinking of the setup they way I am if anyone has feedback for a different setup please let me know.  Also if there is a doc that covers that would be beneficial.

Thanks for everyones help I do appreciate it.


RE: Master node Question

Posted by Jonathan Gray <jg...@fb.com>.
Unfortunately, this is not the case for the HDFS NameNode.  HDFS does not support out-of-the-box backup master nodes.  There are a few different techniques people use for NameNode high availability... you should be able to find some stuff with a Google search, though none are especially simple.

In general, people run the NameNode on a more fault tolerant node... Dual PSUs, RAID, etc... I'd recommend at least mirrored RAID on the master nodes and dual power supplies.

JG

> -----Original Message-----
> From: Joseph Coleman [mailto:joe.coleman@infinitecampus.com]
> Sent: Monday, February 07, 2011 11:38 AM
> To: user@hbase.apache.org
> Subject: Re: Master node Question
> 
> Thank you for the reply that is very helpful I assume this is also the case for
> the HDFS master as well. Just list your masters  MasterA, MasterB and
> MasterC in the masters config files and also have your 2 of the three in the
> slaves as well I take it then execute your start commands from one master
> only example MasterA.
> 
> 
> 
> 
> 
> On 2/7/11 1:29 PM, "Jonathan Gray" <jg...@fb.com> wrote:
> 
> >There is only one active HBase master at any given time, but there can
> >be any number of backup masters.  The failover is automated and
> >coordinated via ZooKeeper.  Regionservers and clients use ZooKeeper to
> >determine who is the current active master.  You can run with as many as
> you want.
> >
> >On larger clusters, I usually recommend around 5 master nodes.  One
> >possible configuration is to have all 5 with ZK (running on their own
> >dedicated spindles), one for the NameNode, one for the
> >SecondaryNameNode, and the other three can be HMasters (one will be
> the
> >normal HMaster, the other backups).
> >
> >JG
> >
> >> -----Original Message-----
> >> From: Joseph Coleman [mailto:joe.coleman@infinitecampus.com]
> >> Sent: Monday, February 07, 2011 11:05 AM
> >> To: user@hbase.apache.org
> >> Subject: Master node Question
> >>
> >> I am in the process of building out a clustered environment that is
> >>going to  start very small but grow extremely fast. I am just trying
> >>to figure out the
> >> following:
> >>
> >> 1. How many master nodes can I have?
> >>
> >> 2. If the answers is only 1 what happens if the master fails or dies.
> >>Does a
> >> slave take over and if so is the slave just referenced in the master
> >>files?
> >>
> >> Thanks for your help I am just trying to understand this before I
> >>order my  equipment. The build out will be a couple master nodes for
> >>redundancy,
> >>10
> >> data nodes running HDFS AND HBASE and a 2 to 3 node zookeeper cluster.
> >> We anticipate that out cluster will grow to 80 nodes before years end
> >>which is  why I am thinking of the setup they way I am if anyone has
> >>feedback for a  different setup please let me know.  Also if there is
> >>a doc that covers that  would be beneficial.
> >>
> >> Thanks for everyones help I do appreciate it.
> >


Re: Master node Question

Posted by Joseph Coleman <jo...@infinitecampus.com>.
Thank you for the reply that is very helpful I assume this is also the
case for the HDFS master as well. Just list your masters  MasterA, MasterB
and MasterC in the masters config files and also have your 2 of the three
in the slaves as well I take it then execute your start commands from one
master only example MasterA.





On 2/7/11 1:29 PM, "Jonathan Gray" <jg...@fb.com> wrote:

>There is only one active HBase master at any given time, but there can be
>any number of backup masters.  The failover is automated and coordinated
>via ZooKeeper.  Regionservers and clients use ZooKeeper to determine who
>is the current active master.  You can run with as many as you want.
>
>On larger clusters, I usually recommend around 5 master nodes.  One
>possible configuration is to have all 5 with ZK (running on their own
>dedicated spindles), one for the NameNode, one for the SecondaryNameNode,
>and the other three can be HMasters (one will be the normal HMaster, the
>other backups).
>
>JG
>
>> -----Original Message-----
>> From: Joseph Coleman [mailto:joe.coleman@infinitecampus.com]
>> Sent: Monday, February 07, 2011 11:05 AM
>> To: user@hbase.apache.org
>> Subject: Master node Question
>> 
>> I am in the process of building out a clustered environment that is
>>going to
>> start very small but grow extremely fast. I am just trying to figure
>>out the
>> following:
>> 
>> 1. How many master nodes can I have?
>> 
>> 2. If the answers is only 1 what happens if the master fails or dies.
>>Does a
>> slave take over and if so is the slave just referenced in the master
>>files?
>> 
>> Thanks for your help I am just trying to understand this before I order
>>my
>> equipment. The build out will be a couple master nodes for redundancy,
>>10
>> data nodes running HDFS AND HBASE and a 2 to 3 node zookeeper cluster.
>> We anticipate that out cluster will grow to 80 nodes before years end
>>which is
>> why I am thinking of the setup they way I am if anyone has feedback for
>>a
>> different setup please let me know.  Also if there is a doc that covers
>>that
>> would be beneficial.
>> 
>> Thanks for everyones help I do appreciate it.
>


RE: Master node Question

Posted by Jonathan Gray <jg...@fb.com>.
There is only one active HBase master at any given time, but there can be any number of backup masters.  The failover is automated and coordinated via ZooKeeper.  Regionservers and clients use ZooKeeper to determine who is the current active master.  You can run with as many as you want.

On larger clusters, I usually recommend around 5 master nodes.  One possible configuration is to have all 5 with ZK (running on their own dedicated spindles), one for the NameNode, one for the SecondaryNameNode, and the other three can be HMasters (one will be the normal HMaster, the other backups).

JG

> -----Original Message-----
> From: Joseph Coleman [mailto:joe.coleman@infinitecampus.com]
> Sent: Monday, February 07, 2011 11:05 AM
> To: user@hbase.apache.org
> Subject: Master node Question
> 
> I am in the process of building out a clustered environment that is going to
> start very small but grow extremely fast. I am just trying to figure out the
> following:
> 
> 1. How many master nodes can I have?
> 
> 2. If the answers is only 1 what happens if the master fails or dies. Does a
> slave take over and if so is the slave just referenced in the master files?
> 
> Thanks for your help I am just trying to understand this before I order my
> equipment. The build out will be a couple master nodes for redundancy, 10
> data nodes running HDFS AND HBASE and a 2 to 3 node zookeeper cluster.
> We anticipate that out cluster will grow to 80 nodes before years end which is
> why I am thinking of the setup they way I am if anyone has feedback for a
> different setup please let me know.  Also if there is a doc that covers that
> would be beneficial.
> 
> Thanks for everyones help I do appreciate it.


Re: Amazon EC2

Posted by Mark Kerzner <ma...@gmail.com>.
It worked fine for me, with Cloudera distribution about a year ago.

I would keep the prepared private AMI, witho whatever additions go into it,
and start the cluster from that. The AMI served as my release version. The
cluster had occasional problems starting, at least back then, due to
networking, but once it was up, it worked great. Of course, you need a
provision to store the HDFS data when the cluster goes down - most likely in
S3, or just always store it in S3, depending on your requirements.

Mark

On Mon, Feb 7, 2011 at 1:25 PM, Peter Haidinyak <ph...@local.com>wrote:

> Hi,
>        We are looking at moving our cluster to Amazon's EC2 solution. Has
> anybody out there already done this or tried and would you have any
> recommendations/warning?
>
> Thanks
>
> -Pete
>

Re: Amazon EC2

Posted by Gary Helmling <gh...@gmail.com>.
I think Jon and Ryan have covered the key points here.

I just want to reiterate that they really valuable aspect of EC2 is the
"elastic" part of the name.  It's great for spinning up a cluster for
testing or batch data processing, without dedicated hardware.  Having the
ability to launch 100 servers on demand is very powerful!  And in these
cases, the economics of EC2 pricing work greatly in your favor.

Where EC2 makes less sense, though, is when you're running an always-on,
24x7, cluster (probably the most frequent scenario for HBase deployments).
 You still avoid the up-front capital expenditure for hardware, but the
monthly cost (especially for HBase where you need to use the larger and more
costly instance types) will quickly overtake the cost of the hardware.  And
at the same time you'll be incurring a performance penalty due to the
virtualized IO and contention for resources with other subscribers.

So do some up-front cost calculations based on your expected usage and
service lifetime.  And be aware of the performance penalty and additional
operational complications.

--gh

On Mon, Feb 7, 2011 at 11:39 AM, Ryan Rawson <ry...@gmail.com> wrote:

> There are other virtualizing environments that offer better perf/$,
> such as softlayer, rackspace cloud, and more.
>
> EC2 is popular... and hence oversubscribed.  People complain about IO
> perf, and while it's not as bad as some people claim, you have to be
> aware that EC2 isnt some magical land where things work great, there
> are lots of gotchas, slower machines, cluster, etc. Running a high
> performance database on low performance systems will end up with a low
> performance database, you might want to check those expectations at
> the door.
>
> Good luck!
> -ryan
>
> On Mon, Feb 7, 2011 at 11:35 AM, Jonathan Gray <jg...@fb.com> wrote:
> > There are others who have had far more experience than I have with HBase
> + EC2, so will let them chime in.  But I personally recommend against this
> direction if you expect to have a consistent cluster size and/or a
> significant amount of load.
> >
> > EC2 is great at quickly scaling up/down, but is usually not cost
> effective if you're running a cluster of a fixed set of nodes 24/7.
> >
> > EC2 also generally experiences far worse IO performance than dedicated
> hardware, so with any significant load, performance suffers on EC2.
> >
> > In addition, EC2 presents its own operational pains and availability
> issues.  Users on EC2 generally have more problems than those with their own
> setups.
> >
> > JG
> >
> >> -----Original Message-----
> >> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> >> Sent: Monday, February 07, 2011 11:25 AM
> >> To: user@hbase.apache.org
> >> Subject: Amazon EC2
> >>
> >> Hi,
> >>       We are looking at moving our cluster to Amazon's EC2 solution. Has
> >> anybody out there already done this or tried and would you have any
> >> recommendations/warning?
> >>
> >> Thanks
> >>
> >> -Pete
> >
>

Re: Amazon EC2

Posted by Ryan Rawson <ry...@gmail.com>.
There are other virtualizing environments that offer better perf/$,
such as softlayer, rackspace cloud, and more.

EC2 is popular... and hence oversubscribed.  People complain about IO
perf, and while it's not as bad as some people claim, you have to be
aware that EC2 isnt some magical land where things work great, there
are lots of gotchas, slower machines, cluster, etc. Running a high
performance database on low performance systems will end up with a low
performance database, you might want to check those expectations at
the door.

Good luck!
-ryan

On Mon, Feb 7, 2011 at 11:35 AM, Jonathan Gray <jg...@fb.com> wrote:
> There are others who have had far more experience than I have with HBase + EC2, so will let them chime in.  But I personally recommend against this direction if you expect to have a consistent cluster size and/or a significant amount of load.
>
> EC2 is great at quickly scaling up/down, but is usually not cost effective if you're running a cluster of a fixed set of nodes 24/7.
>
> EC2 also generally experiences far worse IO performance than dedicated hardware, so with any significant load, performance suffers on EC2.
>
> In addition, EC2 presents its own operational pains and availability issues.  Users on EC2 generally have more problems than those with their own setups.
>
> JG
>
>> -----Original Message-----
>> From: Peter Haidinyak [mailto:phaidinyak@local.com]
>> Sent: Monday, February 07, 2011 11:25 AM
>> To: user@hbase.apache.org
>> Subject: Amazon EC2
>>
>> Hi,
>>       We are looking at moving our cluster to Amazon's EC2 solution. Has
>> anybody out there already done this or tried and would you have any
>> recommendations/warning?
>>
>> Thanks
>>
>> -Pete
>

RE: Amazon EC2

Posted by Jonathan Gray <jg...@fb.com>.
There are others who have had far more experience than I have with HBase + EC2, so will let them chime in.  But I personally recommend against this direction if you expect to have a consistent cluster size and/or a significant amount of load.

EC2 is great at quickly scaling up/down, but is usually not cost effective if you're running a cluster of a fixed set of nodes 24/7.

EC2 also generally experiences far worse IO performance than dedicated hardware, so with any significant load, performance suffers on EC2.

In addition, EC2 presents its own operational pains and availability issues.  Users on EC2 generally have more problems than those with their own setups.

JG

> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Monday, February 07, 2011 11:25 AM
> To: user@hbase.apache.org
> Subject: Amazon EC2
> 
> Hi,
> 	We are looking at moving our cluster to Amazon's EC2 solution. Has
> anybody out there already done this or tried and would you have any
> recommendations/warning?
> 
> Thanks
> 
> -Pete

Amazon EC2

Posted by Peter Haidinyak <ph...@local.com>.
Hi,
	We are looking at moving our cluster to Amazon's EC2 solution. Has anybody out there already done this or tried and would you have any recommendations/warning?

Thanks

-Pete