You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by mi...@gmail.com on 2009/02/03 17:41:26 UTC

Hbase cluster configuration

Hi, all

Does anybody know a rule of thumb to calculate parameters of an Hbase  
cluster.
to handle N read/write requests/sec (100K each) and manage M Tera bytes of  
data ?

For instance, we ran a cluster of 4 hosts: each data node/region server  
host has
2 CPUs 2GHz each, 7.5G RAM, 850G disk. The performance is good enough for  
now,
but what we have to manage 10T with this cluster ?

Thank you for your cooperation,
M.

Re: Hbase cluster configuration

Posted by Michael Dagaev <mi...@gmail.com>.

Andrew, thanks.
Looks like I should think of scalability of a single region server.
I will probably think of it again and ask questions on the list later.

M.

On Wed, Feb 4, 2009 at 1:03 AM, Andrew Purtell <ap...@apache.org> wrote:
> Hi Michael,
>
> I have found that trial and error is necessary now. There are no
> clear formulas. How large the system can scale depends entirely
> on the distributions of various aspects of your data set and on
> the application specific load.
>
> Cluster start up is the most demanding time. If you have an
> inadequate number of xceivers available in the data node, you
> will see regions fail to deploy at start up with transient
> errors recorded in the master log regarding missing blocks.
> This is an indication you need to increase data node resources.
> I keep a tail of the master log up in a window when the cluster
> is starting up. Add a grep on "ERROR" if you just want to catch
> exceptional conditions. When you see this, increase the number
> of configured xceivers by a factor of two and restart.
>
> You should also increase the number of handlers in the data
> node configuration to the number of nodes in your cluster.
>
> Hope this helps,
>
>   - Andy
>
>> From: Michael Dagaev
>> Do you know the upper bound for regions per node for 0.18?
>> As I understand, 1000 regions is still OK but 4000 is not.
>> How can I estimate the amount of memory and number of
>> xceivers for 0.18 if I know the key and value size ?
>
>
>
>
>

Re: Hbase cluster configuration

Posted by Billy Pearson <sa...@pearsonwholesale.com>.

also recommend upgrading to 0.19.0 hadoop/hbase if you can upgrade.

Billy



"Andrew Purtell" <ap...@apache.org> wrote in 
message news:519047.50909.qm@web65513.mail.ac4.yahoo.com...
> Hi Michael,
>
> I have found that trial and error is necessary now. There are no
> clear formulas. How large the system can scale depends entirely
> on the distributions of various aspects of your data set and on
> the application specific load.
>
> Cluster start up is the most demanding time. If you have an
> inadequate number of xceivers available in the data node, you
> will see regions fail to deploy at start up with transient
> errors recorded in the master log regarding missing blocks.
> This is an indication you need to increase data node resources.
> I keep a tail of the master log up in a window when the cluster
> is starting up. Add a grep on "ERROR" if you just want to catch
> exceptional conditions. When you see this, increase the number
> of configured xceivers by a factor of two and restart.
>
> You should also increase the number of handlers in the data
> node configuration to the number of nodes in your cluster.
>
> Hope this helps,
>
>   - Andy
>
>> From: Michael Dagaev
>> Do you know the upper bound for regions per node for 0.18?
>> As I understand, 1000 regions is still OK but 4000 is not.
>> How can I estimate the amount of memory and number of
>> xceivers for 0.18 if I know the key and value size ?
>
>
>
>
>

Re: Hbase cluster configuration

Posted by Andrew Purtell <ap...@apache.org>.

Hi Michael,

I have found that trial and error is necessary now. There are no
clear formulas. How large the system can scale depends entirely
on the distributions of various aspects of your data set and on
the application specific load.

Cluster start up is the most demanding time. If you have an
inadequate number of xceivers available in the data node, you
will see regions fail to deploy at start up with transient
errors recorded in the master log regarding missing blocks. 
This is an indication you need to increase data node resources.
I keep a tail of the master log up in a window when the cluster
is starting up. Add a grep on "ERROR" if you just want to catch
exceptional conditions. When you see this, increase the number
of configured xceivers by a factor of two and restart. 

You should also increase the number of handlers in the data 
node configuration to the number of nodes in your cluster. 

Hope this helps, 

   - Andy

> From: Michael Dagaev
> Do you know the upper bound for regions per node for 0.18?
> As I understand, 1000 regions is still OK but 4000 is not.
> How can I estimate the amount of memory and number of
> xceivers for 0.18 if I know the key and value size ?

Re: Hbase cluster configuration

Posted by Michael Dagaev <mi...@gmail.com>.

Jonathan, I think I got it :)

Do you know the upper bound for regions per node  for 0.18 ?
As I understand, 1000 regions is still OK but 4000 is not.

How can I estimate the amount of memory and number of xceivers for 0.18
if I know the key and value size ?

M.

On Tue, Feb 3, 2009 at 10:48 PM, Jonathan Gray <jl...@streamy.com> wrote:
> Yes, you can of course add more disk space so you do not need as many nodes
> to support the dataset.
>
> However, the ability for a regionserver to scale to 4000 regions is largely
> unknown (and almost certainly impossible with 0.19 release).  With a dataset
> that large you might increase region size from 256MB up to 1GB or so and
> then you'd be back to 1000 regions per node.
>
> Andrew Purtell has done the most experimentation with respect to scaling an
> individual regionserver, but you'll need to do some of your own
> experimentation to see how that would work with your setup.
>
> Amount of memory and number of xceivers will depend on your dataset and the
> version you're running on.  Memory usage is largely tied to index sizes (not
> including writes/memcache and any caching) and currently those are directly
> related to key and value size.  Xceivers will hopefully change dramatically
> with HADOOP-3856 but this will likely be an issue with scaling the number of
> regions on a single RS until it's fixed in the Datanode.
>
> JG
>
>> -----Original Message-----
>> From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
>> Sent: Tuesday, February 03, 2009 12:11 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: Hbase cluster configuration
>>
>> Thank you, Jonathan. I should have done the math :)
>>
>> > You would need ~40 nodes just to support 3X replication on HDFS.
>> With about
>> > 250GB per node, you would have around 1000 regions per node.
>>
>> Ok. Can I add just more disk space to the existing nodes
>> instead of adding nodes to the cluster ?
>>
>> For instance, if I want 10 nodes rather than 40, I will add 1TB per
>> node.
>> Thus, I will have 4000 regions per node and I will have to increase
>> the number of xceivers.
>> Should I add more memory to the nodes as well ?
>>
>> > With 7.5GB of memory on each node, if you can give 3-4GB to the
>> > RegionServer, you should be able to handle that number of regions and
>> have
>> > sufficient memory for indexes and some caching.
>>
>> How much memory do I need to handle 1000 regions ?
>>
>> >  With 0.19.0 hadoop and hbase, you'll be hitting xceiver issues for
>> sure,
>>
>> How many xceivers should I have
>>
>> > but this should be
>> > resolved for the 0.20 release, at which point I am confident we could
>> handle
>> > that load.
>>
>> Thank you for your cooperation,
>> M.
>
>

RE: Hbase cluster configuration

Posted by Jonathan Gray <jl...@streamy.com>.

Yes, you can of course add more disk space so you do not need as many nodes
to support the dataset.

However, the ability for a regionserver to scale to 4000 regions is largely
unknown (and almost certainly impossible with 0.19 release).  With a dataset
that large you might increase region size from 256MB up to 1GB or so and
then you'd be back to 1000 regions per node.

Andrew Purtell has done the most experimentation with respect to scaling an
individual regionserver, but you'll need to do some of your own
experimentation to see how that would work with your setup.

Amount of memory and number of xceivers will depend on your dataset and the
version you're running on.  Memory usage is largely tied to index sizes (not
including writes/memcache and any caching) and currently those are directly
related to key and value size.  Xceivers will hopefully change dramatically
with HADOOP-3856 but this will likely be an issue with scaling the number of
regions on a single RS until it's fixed in the Datanode.

JG

> -----Original Message-----
> From: Michael Dagaev [mailto:michael.dagaev@gmail.com]
> Sent: Tuesday, February 03, 2009 12:11 PM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Hbase cluster configuration
> 
> Thank you, Jonathan. I should have done the math :)
> 
> > You would need ~40 nodes just to support 3X replication on HDFS.
> With about
> > 250GB per node, you would have around 1000 regions per node.
> 
> Ok. Can I add just more disk space to the existing nodes
> instead of adding nodes to the cluster ?
> 
> For instance, if I want 10 nodes rather than 40, I will add 1TB per
> node.
> Thus, I will have 4000 regions per node and I will have to increase
> the number of xceivers.
> Should I add more memory to the nodes as well ?
> 
> > With 7.5GB of memory on each node, if you can give 3-4GB to the
> > RegionServer, you should be able to handle that number of regions and
> have
> > sufficient memory for indexes and some caching.
> 
> How much memory do I need to handle 1000 regions ?
> 
> >  With 0.19.0 hadoop and hbase, you'll be hitting xceiver issues for
> sure,
> 
> How many xceivers should I have
> 
> > but this should be
> > resolved for the 0.20 release, at which point I am confident we could
> handle
> > that load.
> 
> Thank you for your cooperation,
> M.

Re: Hbase cluster configuration

Posted by Michael Dagaev <mi...@gmail.com>.

Thank you, Jonathan. I should have done the math :)

> You would need ~40 nodes just to support 3X replication on HDFS.  With about
> 250GB per node, you would have around 1000 regions per node.

Ok. Can I add just more disk space to the existing nodes
instead of adding nodes to the cluster ?

For instance, if I want 10 nodes rather than 40, I will add 1TB per node.
Thus, I will have 4000 regions per node and I will have to increase
the number of xceivers.
Should I add more memory to the nodes as well ?

> With 7.5GB of memory on each node, if you can give 3-4GB to the
> RegionServer, you should be able to handle that number of regions and have
> sufficient memory for indexes and some caching.

How much memory do I need to handle 1000 regions ?

>  With 0.19.0 hadoop and hbase, you'll be hitting xceiver issues for sure,

How many xceivers should I have

> but this should be
> resolved for the 0.20 release, at which point I am confident we could handle
> that load.

Thank you for your cooperation,
M.

RE: Hbase cluster configuration

Posted by Jonathan Gray <jl...@streamy.com>.

Michael,

You would need ~40 nodes just to support 3X replication on HDFS.  With about
250GB per node, you would have around 1000 regions per node.

With 7.5GB of memory on each node, if you can give 3-4GB to the
RegionServer, you should be able to handle that number of regions and have
sufficient memory for indexes and some caching.  With 0.19.0 hadoop and
hbase, you'll be hitting xceiver issues for sure, but this should be
resolved for the 0.20 release, at which point I am confident we could handle
that load.

You'd also need sufficient memory in the NameNode, though 30TB is not too
much.

That doesn't address the performance you need in terms of reading, you would
have to do your own benchmarks with your dataset and access pattern.  You
should be able to see how much concurrency you can pull out of an individual
regionserver and extract that out to 40 nodes; read throughput scales (close
enough to) linearly if your reads are well distributed across the entire
dataset.  Of course, if you have hot spots you will be limited to the
performance of an individual server and will not benefit from a larger
cluster.

JG

> -----Original Message-----
> From: michael.dagaev@gmail.com [mailto:michael.dagaev@gmail.com]
> Sent: Tuesday, February 03, 2009 8:41 AM
> To: hbase-user@hadoop.apache.org
> Subject: Hbase cluster configuration
> 
> Hi, all
> 
> Does anybody know a rule of thumb to calculate parameters of an Hbase
> cluster.
> to handle N read/write requests/sec (100K each) and manage M Tera bytes
> of
> data ?
> 
> For instance, we ran a cluster of 4 hosts: each data node/region server
> host has
> 2 CPUs 2GHz each, 7.5G RAM, 850G disk. The performance is good enough
> for
> now,
> but what we have to manage 10T with this cluster ?
> 
> Thank you for your cooperation,
> M.