You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bill Q <bi...@gmail.com> on 2014/01/20 07:27:39 UTC

Questions about HBase load balancing and HFile

Hi,
I am trying to get more information about HBase. I would appreciate some
answers to these few questions. Thanks a lot.

1. About load balancing: does HMaster monitor overloaded or low loaded
HRegionServer, and move some regions from the hot HRegionServer to low
loaded ones (with or without add new servers into the cluster,
respectively)?

2. About region splitting: when splitting a region, will the newly created
regions stay on the current HRegionSever, or will HMaster assign some new
HRegionServers to take the newly created two regions?

3. About HFile size: Lars mentioned here
http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html that
the HFile size is default to 64k. How does this work while the default HDFS
block is 64M/128M? Would the small HFile size waste lots of space on HDFS?

4. About data locality: if a HRegionServer fails, the HMaster would assign
a new HRegionServer to take its place. But does this new HRegionServer
should have access to the storeFiles? I assumed that's how it works by
using HDFS's data replication. But after some readings, I got confused. It
seems that the new HRegionServer can work without the storeFiles data at
local. How does this work at all?

Many thanks.


Bill

Re: Questions about HBase load balancing and HFile

Posted by Ted Yu <yu...@gmail.com>.
bq. capacity load on terms of numbers of regions per region server

I guess you meant to say 'in terms of ...'

Yes. 0.94 load balancer looks at region count only.


On Tue, Jan 21, 2014 at 9:39 PM, Asaf Mesika <as...@gmail.com> wrote:

> If hot means many requests, then it's only in 0.96 right? 0.94 is only
> addressing capacity load on terms of numbers of regions per region server
> of the same table.
>
> On Monday, January 20, 2014, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. under heavy load by serving to hot regions
> >
> > Did you mean 'two hot regions' ?
> > If so, the master will move one of them to another RS.
> >
> > Cheers
> >
> >
> > On Mon, Jan 20, 2014 at 6:17 AM, Bill Q <bi...@gmail.com> wrote:
> >
> > > Hi Ted and Bharath,
> > > Thanks a lot for the replies.
> > >
> > > For question #1, if there is a RS is under heavy load by serving to hot
> > > regions, the HMaster will move one of the two regions to another RS, or
> > > HMaster will split both of them and move the newly crated halves to
> other
> > > RSs?
> > >
> > > For question #3, does this mean that a HFile has many 64k blocks, but
> > > itself is around 64M (or 128M)?
> > >
> > >
> > > Many thanks.
> > >
> > >
> > > Bill
> > >
> > >
> > > On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada <
> > > bharathv@cloudera.com
> > > > wrote:
> > >
> > > > For question #3, The block size Lars talks about is the blocksize
> > inside
> > > a
> > > > HFile which is different from HDFS block size. Look at
> > > > http://hbase.apache.org/book/apes03.html . Hfile is indexed as
> blocks
> > to
> > > > facilitate random access to data so that we can skip unnecessary disk
> > > > blocks while gets/scans. Smaller the hfile block size better is the
> > > random
> > > > read performance. You can see the detailed hfile layout in that link.
> > > >
> > > > For question #4, You are correct, since the data resides on HDFS,
> each
> > > > region server has access to all the storefiles (they just use hdfs
> api
> > to
> > > > read them). The reason they are still available after a (RS+datanode)
> > > crash
> > > > is because of the replication in hdfs. The store files still have
> valid
> > > > replicas and namenode tries to maintain the replication factor by
> > > > re-replicating them eventually.
> > > >
> > > >
> > > > On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > > > For question #1, there is load balancer in HMaster which does the
> job
> > > of
> > > > > balancing region load.
> > > > >
> > > > > For number 2, the daughter regions stay on the same server as the
> > > parent
> > > > > after split. Later one or both of them may be moved to other region
> > > > servers.
> > > > >
> > > > > Cheers
> > > > >
> > > > > On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > > I am trying to get more information about HBase. I would
> appreciate
> > > > some
> > > > > > answers to these few questions. Thanks a lot.
> > > > > >
> > > > > > 1. About load balancing: does HMaster monitor overloaded or low
> > > loaded
> > > > > > HRegionServer, and move some regions from the hot HRegionServer
> to
> > > low
> > > > > > loaded ones (with or without add new servers into the cluster,
> > > > > > respectively)?
> > > > > >
> > > > > > 2. About region splitting: when splitting a region, will the
> newly
> > > > > created
> > > > > > regions stay on the current HRegionSever, or will HMaster assign
> > some
> > > > new
> > > > > > HRegionServers to take the newly created two regions?
> > > > > >
> > > > > > 3. About HFile size: Lars mentioned here
> > > > > >
> > > >
> > >
> >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > > > > the HFile size is default to 64k. How does this work while the
> > > default
> > > > > HDFS
> > > > > > block is 64M/128M? Would the small HFile size waste lots of space
> > on
> > > > > HDFS?
> > > > > >
> > > > > > 4. About data locality: if a HRegionServer fails, the HMaster
> would
> > > > > assign
> > > > > > a new HRegionServer to take its place. But does this new
> > > HRegionServer
> > > > > > should have access to the storeFiles? I assumed that's how it
> works
> > > by
> > > > > > using HDFS's data replication. But after some readings, I got
> > > confused.
> > > > > It
> > > > > > seems that the new HRegionServer can work without the storeFiles
> > data
> > > > a
>

Re: Questions about HBase load balancing and HFile

Posted by Asaf Mesika <as...@gmail.com>.
If hot means many requests, then it's only in 0.96 right? 0.94 is only
addressing capacity load on terms of numbers of regions per region server
of the same table.

On Monday, January 20, 2014, Ted Yu <yu...@gmail.com> wrote:

> bq. under heavy load by serving to hot regions
>
> Did you mean 'two hot regions' ?
> If so, the master will move one of them to another RS.
>
> Cheers
>
>
> On Mon, Jan 20, 2014 at 6:17 AM, Bill Q <bi...@gmail.com> wrote:
>
> > Hi Ted and Bharath,
> > Thanks a lot for the replies.
> >
> > For question #1, if there is a RS is under heavy load by serving to hot
> > regions, the HMaster will move one of the two regions to another RS, or
> > HMaster will split both of them and move the newly crated halves to other
> > RSs?
> >
> > For question #3, does this mean that a HFile has many 64k blocks, but
> > itself is around 64M (or 128M)?
> >
> >
> > Many thanks.
> >
> >
> > Bill
> >
> >
> > On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada <
> > bharathv@cloudera.com
> > > wrote:
> >
> > > For question #3, The block size Lars talks about is the blocksize
> inside
> > a
> > > HFile which is different from HDFS block size. Look at
> > > http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks
> to
> > > facilitate random access to data so that we can skip unnecessary disk
> > > blocks while gets/scans. Smaller the hfile block size better is the
> > random
> > > read performance. You can see the detailed hfile layout in that link.
> > >
> > > For question #4, You are correct, since the data resides on HDFS, each
> > > region server has access to all the storefiles (they just use hdfs api
> to
> > > read them). The reason they are still available after a (RS+datanode)
> > crash
> > > is because of the replication in hdfs. The store files still have valid
> > > replicas and namenode tries to maintain the replication factor by
> > > re-replicating them eventually.
> > >
> > >
> > > On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > For question #1, there is load balancer in HMaster which does the job
> > of
> > > > balancing region load.
> > > >
> > > > For number 2, the daughter regions stay on the same server as the
> > parent
> > > > after split. Later one or both of them may be moved to other region
> > > servers.
> > > >
> > > > Cheers
> > > >
> > > > On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > > I am trying to get more information about HBase. I would appreciate
> > > some
> > > > > answers to these few questions. Thanks a lot.
> > > > >
> > > > > 1. About load balancing: does HMaster monitor overloaded or low
> > loaded
> > > > > HRegionServer, and move some regions from the hot HRegionServer to
> > low
> > > > > loaded ones (with or without add new servers into the cluster,
> > > > > respectively)?
> > > > >
> > > > > 2. About region splitting: when splitting a region, will the newly
> > > > created
> > > > > regions stay on the current HRegionSever, or will HMaster assign
> some
> > > new
> > > > > HRegionServers to take the newly created two regions?
> > > > >
> > > > > 3. About HFile size: Lars mentioned here
> > > > >
> > >
> >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > > > the HFile size is default to 64k. How does this work while the
> > default
> > > > HDFS
> > > > > block is 64M/128M? Would the small HFile size waste lots of space
> on
> > > > HDFS?
> > > > >
> > > > > 4. About data locality: if a HRegionServer fails, the HMaster would
> > > > assign
> > > > > a new HRegionServer to take its place. But does this new
> > HRegionServer
> > > > > should have access to the storeFiles? I assumed that's how it works
> > by
> > > > > using HDFS's data replication. But after some readings, I got
> > confused.
> > > > It
> > > > > seems that the new HRegionServer can work without the storeFiles
> data
> > > a

Re: Questions about HBase load balancing and HFile

Posted by Ted Yu <yu...@gmail.com>.
bq. under heavy load by serving to hot regions

Did you mean 'two hot regions' ?
If so, the master will move one of them to another RS.

Cheers


On Mon, Jan 20, 2014 at 6:17 AM, Bill Q <bi...@gmail.com> wrote:

> Hi Ted and Bharath,
> Thanks a lot for the replies.
>
> For question #1, if there is a RS is under heavy load by serving to hot
> regions, the HMaster will move one of the two regions to another RS, or
> HMaster will split both of them and move the newly crated halves to other
> RSs?
>
> For question #3, does this mean that a HFile has many 64k blocks, but
> itself is around 64M (or 128M)?
>
>
> Many thanks.
>
>
> Bill
>
>
> On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada <
> bharathv@cloudera.com
> > wrote:
>
> > For question #3, The block size Lars talks about is the blocksize inside
> a
> > HFile which is different from HDFS block size. Look at
> > http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to
> > facilitate random access to data so that we can skip unnecessary disk
> > blocks while gets/scans. Smaller the hfile block size better is the
> random
> > read performance. You can see the detailed hfile layout in that link.
> >
> > For question #4, You are correct, since the data resides on HDFS, each
> > region server has access to all the storefiles (they just use hdfs api to
> > read them). The reason they are still available after a (RS+datanode)
> crash
> > is because of the replication in hdfs. The store files still have valid
> > replicas and namenode tries to maintain the replication factor by
> > re-replicating them eventually.
> >
> >
> > On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > For question #1, there is load balancer in HMaster which does the job
> of
> > > balancing region load.
> > >
> > > For number 2, the daughter regions stay on the same server as the
> parent
> > > after split. Later one or both of them may be moved to other region
> > servers.
> > >
> > > Cheers
> > >
> > > On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:
> > >
> > > > Hi,
> > > > I am trying to get more information about HBase. I would appreciate
> > some
> > > > answers to these few questions. Thanks a lot.
> > > >
> > > > 1. About load balancing: does HMaster monitor overloaded or low
> loaded
> > > > HRegionServer, and move some regions from the hot HRegionServer to
> low
> > > > loaded ones (with or without add new servers into the cluster,
> > > > respectively)?
> > > >
> > > > 2. About region splitting: when splitting a region, will the newly
> > > created
> > > > regions stay on the current HRegionSever, or will HMaster assign some
> > new
> > > > HRegionServers to take the newly created two regions?
> > > >
> > > > 3. About HFile size: Lars mentioned here
> > > >
> >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > > the HFile size is default to 64k. How does this work while the
> default
> > > HDFS
> > > > block is 64M/128M? Would the small HFile size waste lots of space on
> > > HDFS?
> > > >
> > > > 4. About data locality: if a HRegionServer fails, the HMaster would
> > > assign
> > > > a new HRegionServer to take its place. But does this new
> HRegionServer
> > > > should have access to the storeFiles? I assumed that's how it works
> by
> > > > using HDFS's data replication. But after some readings, I got
> confused.
> > > It
> > > > seems that the new HRegionServer can work without the storeFiles data
> > at
> > > > local. How does this work at all?
> > > >
> > > > Many thanks.
> > > >
> > > >
> > > > Bill
> > >
> >
> >
> >
> > --
> > Bharath Vissapragada
> > <http://www.cloudera.com>
> >
>

Re: Questions about HBase load balancing and HFile

Posted by Bill Q <bi...@gmail.com>.
Hi Ted and Bharath,
Thanks a lot for the replies.

For question #1, if there is a RS is under heavy load by serving to hot
regions, the HMaster will move one of the two regions to another RS, or
HMaster will split both of them and move the newly crated halves to other
RSs?

For question #3, does this mean that a HFile has many 64k blocks, but
itself is around 64M (or 128M)?


Many thanks.


Bill


On Mon, Jan 20, 2014 at 1:49 AM, Bharath Vissapragada <bharathv@cloudera.com
> wrote:

> For question #3, The block size Lars talks about is the blocksize inside a
> HFile which is different from HDFS block size. Look at
> http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to
> facilitate random access to data so that we can skip unnecessary disk
> blocks while gets/scans. Smaller the hfile block size better is the random
> read performance. You can see the detailed hfile layout in that link.
>
> For question #4, You are correct, since the data resides on HDFS, each
> region server has access to all the storefiles (they just use hdfs api to
> read them). The reason they are still available after a (RS+datanode) crash
> is because of the replication in hdfs. The store files still have valid
> replicas and namenode tries to maintain the replication factor by
> re-replicating them eventually.
>
>
> On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For question #1, there is load balancer in HMaster which does the job of
> > balancing region load.
> >
> > For number 2, the daughter regions stay on the same server as the parent
> > after split. Later one or both of them may be moved to other region
> servers.
> >
> > Cheers
> >
> > On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:
> >
> > > Hi,
> > > I am trying to get more information about HBase. I would appreciate
> some
> > > answers to these few questions. Thanks a lot.
> > >
> > > 1. About load balancing: does HMaster monitor overloaded or low loaded
> > > HRegionServer, and move some regions from the hot HRegionServer to low
> > > loaded ones (with or without add new servers into the cluster,
> > > respectively)?
> > >
> > > 2. About region splitting: when splitting a region, will the newly
> > created
> > > regions stay on the current HRegionSever, or will HMaster assign some
> new
> > > HRegionServers to take the newly created two regions?
> > >
> > > 3. About HFile size: Lars mentioned here
> > >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > the HFile size is default to 64k. How does this work while the default
> > HDFS
> > > block is 64M/128M? Would the small HFile size waste lots of space on
> > HDFS?
> > >
> > > 4. About data locality: if a HRegionServer fails, the HMaster would
> > assign
> > > a new HRegionServer to take its place. But does this new HRegionServer
> > > should have access to the storeFiles? I assumed that's how it works by
> > > using HDFS's data replication. But after some readings, I got confused.
> > It
> > > seems that the new HRegionServer can work without the storeFiles data
> at
> > > local. How does this work at all?
> > >
> > > Many thanks.
> > >
> > >
> > > Bill
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Re: Questions about HBase load balancing and HFile

Posted by Ted Yu <yu...@gmail.com>.
For question #4, see also
http://hbase.apache.org/book.html#regions.arch.locality

Cheers


On Sun, Jan 19, 2014 at 10:49 PM, Bharath Vissapragada <
bharathv@cloudera.com> wrote:

> For question #3, The block size Lars talks about is the blocksize inside a
> HFile which is different from HDFS block size. Look at
> http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to
> facilitate random access to data so that we can skip unnecessary disk
> blocks while gets/scans. Smaller the hfile block size better is the random
> read performance. You can see the detailed hfile layout in that link.
>
> For question #4, You are correct, since the data resides on HDFS, each
> region server has access to all the storefiles (they just use hdfs api to
> read them). The reason they are still available after a (RS+datanode) crash
> is because of the replication in hdfs. The store files still have valid
> replicas and namenode tries to maintain the replication factor by
> re-replicating them eventually.
>
>
> On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > For question #1, there is load balancer in HMaster which does the job of
> > balancing region load.
> >
> > For number 2, the daughter regions stay on the same server as the parent
> > after split. Later one or both of them may be moved to other region
> servers.
> >
> > Cheers
> >
> > On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:
> >
> > > Hi,
> > > I am trying to get more information about HBase. I would appreciate
> some
> > > answers to these few questions. Thanks a lot.
> > >
> > > 1. About load balancing: does HMaster monitor overloaded or low loaded
> > > HRegionServer, and move some regions from the hot HRegionServer to low
> > > loaded ones (with or without add new servers into the cluster,
> > > respectively)?
> > >
> > > 2. About region splitting: when splitting a region, will the newly
> > created
> > > regions stay on the current HRegionSever, or will HMaster assign some
> new
> > > HRegionServers to take the newly created two regions?
> > >
> > > 3. About HFile size: Lars mentioned here
> > >
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > > the HFile size is default to 64k. How does this work while the default
> > HDFS
> > > block is 64M/128M? Would the small HFile size waste lots of space on
> > HDFS?
> > >
> > > 4. About data locality: if a HRegionServer fails, the HMaster would
> > assign
> > > a new HRegionServer to take its place. But does this new HRegionServer
> > > should have access to the storeFiles? I assumed that's how it works by
> > > using HDFS's data replication. But after some readings, I got confused.
> > It
> > > seems that the new HRegionServer can work without the storeFiles data
> at
> > > local. How does this work at all?
> > >
> > > Many thanks.
> > >
> > >
> > > Bill
> >
>
>
>
> --
> Bharath Vissapragada
> <http://www.cloudera.com>
>

Re: Questions about HBase load balancing and HFile

Posted by Bharath Vissapragada <bh...@cloudera.com>.
For question #3, The block size Lars talks about is the blocksize inside a
HFile which is different from HDFS block size. Look at
http://hbase.apache.org/book/apes03.html . Hfile is indexed as blocks to
facilitate random access to data so that we can skip unnecessary disk
blocks while gets/scans. Smaller the hfile block size better is the random
read performance. You can see the detailed hfile layout in that link.

For question #4, You are correct, since the data resides on HDFS, each
region server has access to all the storefiles (they just use hdfs api to
read them). The reason they are still available after a (RS+datanode) crash
is because of the replication in hdfs. The store files still have valid
replicas and namenode tries to maintain the replication factor by
re-replicating them eventually.


On Mon, Jan 20, 2014 at 12:08 PM, Ted Yu <yu...@gmail.com> wrote:

> For question #1, there is load balancer in HMaster which does the job of
> balancing region load.
>
> For number 2, the daughter regions stay on the same server as the parent
> after split. Later one or both of them may be moved to other region servers.
>
> Cheers
>
> On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:
>
> > Hi,
> > I am trying to get more information about HBase. I would appreciate some
> > answers to these few questions. Thanks a lot.
> >
> > 1. About load balancing: does HMaster monitor overloaded or low loaded
> > HRegionServer, and move some regions from the hot HRegionServer to low
> > loaded ones (with or without add new servers into the cluster,
> > respectively)?
> >
> > 2. About region splitting: when splitting a region, will the newly
> created
> > regions stay on the current HRegionSever, or will HMaster assign some new
> > HRegionServers to take the newly created two regions?
> >
> > 3. About HFile size: Lars mentioned here
> > http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.htmlthat
> > the HFile size is default to 64k. How does this work while the default
> HDFS
> > block is 64M/128M? Would the small HFile size waste lots of space on
> HDFS?
> >
> > 4. About data locality: if a HRegionServer fails, the HMaster would
> assign
> > a new HRegionServer to take its place. But does this new HRegionServer
> > should have access to the storeFiles? I assumed that's how it works by
> > using HDFS's data replication. But after some readings, I got confused.
> It
> > seems that the new HRegionServer can work without the storeFiles data at
> > local. How does this work at all?
> >
> > Many thanks.
> >
> >
> > Bill
>



-- 
Bharath Vissapragada
<http://www.cloudera.com>

Re: Questions about HBase load balancing and HFile

Posted by Ted Yu <yu...@gmail.com>.
For question #1, there is load balancer in HMaster which does the job of balancing region load. 

For number 2, the daughter regions stay on the same server as the parent after split. Later one or both of them may be moved to other region servers. 

Cheers

On Jan 19, 2014, at 10:27 PM, Bill Q <bi...@gmail.com> wrote:

> Hi,
> I am trying to get more information about HBase. I would appreciate some
> answers to these few questions. Thanks a lot.
> 
> 1. About load balancing: does HMaster monitor overloaded or low loaded
> HRegionServer, and move some regions from the hot HRegionServer to low
> loaded ones (with or without add new servers into the cluster,
> respectively)?
> 
> 2. About region splitting: when splitting a region, will the newly created
> regions stay on the current HRegionSever, or will HMaster assign some new
> HRegionServers to take the newly created two regions?
> 
> 3. About HFile size: Lars mentioned here
> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html that
> the HFile size is default to 64k. How does this work while the default HDFS
> block is 64M/128M? Would the small HFile size waste lots of space on HDFS?
> 
> 4. About data locality: if a HRegionServer fails, the HMaster would assign
> a new HRegionServer to take its place. But does this new HRegionServer
> should have access to the storeFiles? I assumed that's how it works by
> using HDFS's data replication. But after some readings, I got confused. It
> seems that the new HRegionServer can work without the storeFiles data at
> local. How does this work at all?
> 
> Many thanks.
> 
> 
> Bill