You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by Joarder KAMAL <jo...@gmail.com> on 2013/06/27 01:24:21 UTC

Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

May be a simple question to answer for the experienced HBase users and
developers:

If I use hash partitioning to evenly distribute write workloads into my
region servers and later add a new region server to scale or split an
existing region, then do I need to change my hash function and re-shuffle
all the existing data in between all the region servers (old and new)? Or,
is there any better solution for this? Any guidance would be very much
helpful.

Thanks in advance.

 
Regards,
Joarder Kamal

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Shahab Yunus <sh...@gmail.com>.

I don't have a particular document or source stating this but I think it is
actually kind of self-explanatory if your think about the algorithm.

Anyway, you can read this
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

And some older discussions by experts on this topic:
http://search-hadoop.com/?q=prefix+salt+key+hotspot&fc_project=HBase

Regards,
Shahab


On Thu, Jun 27, 2013 at 9:44 AM, Joarder KAMAL <jo...@gmail.com> wrote:

> Thanks Shahab for the reply. I was also thinking in the same way.
> Could you able to guide me through any reference which can confirm this
> understanding?
>
> 
> Regards,
> Joarder Kamal
>
>
>
> On 27 June 2013 23:24, Shahab Yunus <sh...@gmail.com> wrote:
>
> > I think you will need to update your hash function and redistribute data.
> > As far as I know this has been on of the drawbacks of this approach (and
> > the SemaText library)
> >
> > Regards,
> > Shahab
> >
> >
> > On Wed, Jun 26, 2013 at 7:24 PM, Joarder KAMAL <jo...@gmail.com>
> wrote:
> >
> > > May be a simple question to answer for the experienced HBase users and
> > > developers:
> > >
> > > If I use hash partitioning to evenly distribute write workloads into my
> > > region servers and later add a new region server to scale or split an
> > > existing region, then do I need to change my hash function and
> re-shuffle
> > > all the existing data in between all the region servers (old and new)?
> > Or,
> > > is there any better solution for this? Any guidance would be very much
> > > helpful.
> > >
> > > Thanks in advance.
> > >
> > >
> > > Regards,
> > > Joarder Kamal
> > >
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Joarder KAMAL <jo...@gmail.com>.

Thanks Shahab for the reply. I was also thinking in the same way.
Could you able to guide me through any reference which can confirm this
understanding?


Regards,
Joarder Kamal



On 27 June 2013 23:24, Shahab Yunus <sh...@gmail.com> wrote:

> I think you will need to update your hash function and redistribute data.
> As far as I know this has been on of the drawbacks of this approach (and
> the SemaText library)
>
> Regards,
> Shahab
>
>
> On Wed, Jun 26, 2013 at 7:24 PM, Joarder KAMAL <jo...@gmail.com> wrote:
>
> > May be a simple question to answer for the experienced HBase users and
> > developers:
> >
> > If I use hash partitioning to evenly distribute write workloads into my
> > region servers and later add a new region server to scale or split an
> > existing region, then do I need to change my hash function and re-shuffle
> > all the existing data in between all the region servers (old and new)?
> Or,
> > is there any better solution for this? Any guidance would be very much
> > helpful.
> >
> > Thanks in advance.
> >
> >
> > Regards,
> > Joarder Kamal
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Joarder KAMAL <jo...@gmail.com>.

Thanks Shahab for the reply. I was also thinking in the same way.
Could you able to guide me through any reference which can confirm this
understanding?


Regards,
Joarder Kamal



On 27 June 2013 23:24, Shahab Yunus <sh...@gmail.com> wrote:

> I think you will need to update your hash function and redistribute data.
> As far as I know this has been on of the drawbacks of this approach (and
> the SemaText library)
>
> Regards,
> Shahab
>
>
> On Wed, Jun 26, 2013 at 7:24 PM, Joarder KAMAL <jo...@gmail.com> wrote:
>
> > May be a simple question to answer for the experienced HBase users and
> > developers:
> >
> > If I use hash partitioning to evenly distribute write workloads into my
> > region servers and later add a new region server to scale or split an
> > existing region, then do I need to change my hash function and re-shuffle
> > all the existing data in between all the region servers (old and new)?
> Or,
> > is there any better solution for this? Any guidance would be very much
> > helpful.
> >
> > Thanks in advance.
> >
> >
> > Regards,
> > Joarder Kamal
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Shahab Yunus <sh...@gmail.com>.

I think you will need to update your hash function and redistribute data.
As far as I know this has been on of the drawbacks of this approach (and
the SemaText library)

Regards,
Shahab

On Wed, Jun 26, 2013 at 7:24 PM, Joarder KAMAL <jo...@gmail.com> wrote:

> May be a simple question to answer for the experienced HBase users and
> developers:
>
> If I use hash partitioning to evenly distribute write workloads into my
> region servers and later add a new region server to scale or split an
> existing region, then do I need to change my hash function and re-shuffle
> all the existing data in between all the region servers (old and new)? Or,
> is there any better solution for this? Any guidance would be very much
> helpful.
>
> Thanks in advance.
>
>  
> Regards,
> Joarder Kamal
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by ramkrishna vasudevan <ra...@gmail.com>.

I would suggest you could write a custom load balancer and then have your
hashing algo to determine how the load balancing should happen.  Hope this
helps.

Regards
Ram


On Fri, Jun 28, 2013 at 5:32 AM, Joarder KAMAL <jo...@gmail.com> wrote:

> Thanks St.Ack for mentioning about the load-balancer.
>
> But my question was two folded:
> Case-1. If a new RS is added, then the load-balancer will do it's job
> considering no new region has been created in the meanwhile. // As you've
> already answered.
>
> Case-2. Whether a new RS is added or not, an existing region is splitted
> into two, then how the new writes will to the new region? Because, lets say
> initially the hash function was calculated with *N* Regions and now there
> are *N+1* Regions in the cluster.
>
> In that case, do I need to change the Hash function and reshuffle all the
> existing data within the cluster !! Or, HBase has some mechanism to handle
> this?
>
>
> Many thanks again for helping me out...
>
>
> 
> Regards,
> Joarder Kamal
>
> On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:
>
> > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com>
> wrote:
> >
> > > May be a simple question to answer for the experienced HBase users and
> > > developers:
> > >
> > > If I use hash partitioning to evenly distribute write workloads into my
> > > region servers and later add a new region server to scale or split an
> > > existing region, then do I need to change my hash function and
> re-shuffle
> > > all the existing data in between all the region servers (old and new)?
> > Or,
> > > is there any better solution for this? Any guidance would be very much
> > > helpful.
> > >
> >
> > You do not need to change your hash function.
> >
> > When you add a new regionserver, the balancer will move some of the
> > existing regions to the new host.
> >
> > St.Ack
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Joarder KAMAL <jo...@gmail.com>.

Dear Nick,

Thanks a lot for the nice explanation. I just left yet with a small
confusion:

As you told: Your logical partitioning remains the same whether it's being
served by N, 2N, or 3.5N+2 RegionServers.

When a region is splitted into two, I guess the logical partitioning is
changed, right? Could you kindly clarify a bit more.
And my question is without making any change to my initial
row-key-generation function, how new writes will go to the new regions?
I assume it is hard to predict the number of RS initially as well as
creating pre-splitted regions in very large-scale production systems. I am
not worried about the default load-balancing behaviour of HBase. St.Ack and
you've also clearly explained that as well.

For an example: as indicated in the Lars George's book where he used <# of
RS> while finding the prefix, I guess <# of Regions> could be also used (if
regions are pre-splitted)
----------------------------------------------------------------------------------------------------------------------------
*Salting*
You can use a salting prefix to the key that guarantees a spread of all
rows across all region servers. For example:

byte prefix = (byte) (Long.hashCode(timestamp) % <number of region
servers>);
byte[] rowkey = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(timestamp);

This formula will generate enough prefix numbers to ensure that rows are
sent to all region servers. Of course, the formula assumes a
*specific*number of servers, and if
you are planning to *grow your cluster* you should set this number to a *
multiple* instead.
----------------------------------------------------------------------------------------------------------------------------

Thanks again ...


Regards,
Joarder Kamal


On 3 July 2013 03:37, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi Joarder,
>
> I think you're slightly confused about the impact of using a hashed (or
> sometimes called "salted") prefix for your rowkeys. This strategy for
> rowkey design has an impact on the logical ordering of your data, not
> necessarily the physical distribution of your data. In HBase, these are
> orthogonal concerns. It means that to execute a bucket-agnostic query, the
> client must initiate N scans. However, there's no guarantee that all
> regions starting with the same hash land on the same RegionServer. Region
> assignment is a complex beast; as I understand, it's based on a randomish,
> load-based assignment.
>
> Take a look at your existing table distributed on a size-N cluster. Do all
> regions that fall within the first bucket sit on the same RegionServer?
> Likely not. However, look at the number of regions assigned to each
> RegionServer. This should be close to even. Adding a new RegionServer to
> the cluster will result in some of those regions migrating from the other
> servers to the new one. The impact will be a decrease in the average number
> of regions served per RegionServer.
>
> Your logical partitioning remains the same whether it's being served by N,
> 2N, or 3.5N+2 RegionServers. Your client always needs to execute that
> bucket-agnostic query as N scans, touching each of the N buckets. Precisely
> which RegionServers are touched by any given scan depends entirely on how
> the balancer has distributed load on your cluster.
>
> Thanks,
> Nick
>
> On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL <jo...@gmail.com> wrote:
>
> > Thanks St.Ack for mentioning about the load-balancer.
> >
> > But my question was two folded:
> > Case-1. If a new RS is added, then the load-balancer will do it's job
> > considering no new region has been created in the meanwhile. // As you've
> > already answered.
> >
> > Case-2. Whether a new RS is added or not, an existing region is splitted
> > into two, then how the new writes will to the new region? Because, lets
> say
> > initially the hash function was calculated with *N* Regions and now there
> > are *N+1* Regions in the cluster.
> >
> > In that case, do I need to change the Hash function and reshuffle all
> the
> > existing data within the cluster !! Or, HBase has some mechanism to
> handle
> > this?
> >
> >
> > Many thanks again for helping me out...
> >
> >
> > 
> > Regards,
> > Joarder Kamal
> >
> > On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:
> >
> > > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com>
> > wrote:
> > >
> > > > May be a simple question to answer for the experienced HBase users
> and
> > > > developers:
> > > >
> > > > If I use hash partitioning to evenly distribute write workloads into
> my
> > > > region servers and later add a new region server to scale or split an
> > > > existing region, then do I need to change my hash function and
> > re-shuffle
> > > > all the existing data in between all the region servers (old and
> new)?
> > > Or,
> > > > is there any better solution for this? Any guidance would be very
> much
> > > > helpful.
> > > >
> > >
> > > You do not need to change your hash function.
> > >
> > > When you add a new regionserver, the balancer will move some of the
> > > existing regions to the new host.
> > >
> > > St.Ack
> > >
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Joarder KAMAL <jo...@gmail.com>.

Dear Nick,

Thanks a lot for the nice explanation. I just left yet with a small
confusion:

As you told: Your logical partitioning remains the same whether it's being
served by N, 2N, or 3.5N+2 RegionServers.

When a region is splitted into two, I guess the logical partitioning is
changed, right? Could you kindly clarify a bit more.
And my question is without making any change to my initial
row-key-generation function, how new writes will go to the new regions?
I assume it is hard to predict the number of RS initially as well as
creating pre-splitted regions in very large-scale production systems. I am
not worried about the default load-balancing behaviour of HBase. St.Ack and
you've also clearly explained that as well.

For an example: as indicated in the Lars George's book where he used <# of
RS> while finding the prefix, I guess <# of Regions> could be also used (if
regions are pre-splitted)
----------------------------------------------------------------------------------------------------------------------------
*Salting*
You can use a salting prefix to the key that guarantees a spread of all
rows across all region servers. For example:

byte prefix = (byte) (Long.hashCode(timestamp) % <number of region
servers>);
byte[] rowkey = Bytes.add(Bytes.toBytes(prefix), Bytes.toBytes(timestamp);

This formula will generate enough prefix numbers to ensure that rows are
sent to all region servers. Of course, the formula assumes a
*specific*number of servers, and if
you are planning to *grow your cluster* you should set this number to a *
multiple* instead.
----------------------------------------------------------------------------------------------------------------------------

Thanks again ...


Regards,
Joarder Kamal


On 3 July 2013 03:37, Nick Dimiduk <nd...@gmail.com> wrote:

> Hi Joarder,
>
> I think you're slightly confused about the impact of using a hashed (or
> sometimes called "salted") prefix for your rowkeys. This strategy for
> rowkey design has an impact on the logical ordering of your data, not
> necessarily the physical distribution of your data. In HBase, these are
> orthogonal concerns. It means that to execute a bucket-agnostic query, the
> client must initiate N scans. However, there's no guarantee that all
> regions starting with the same hash land on the same RegionServer. Region
> assignment is a complex beast; as I understand, it's based on a randomish,
> load-based assignment.
>
> Take a look at your existing table distributed on a size-N cluster. Do all
> regions that fall within the first bucket sit on the same RegionServer?
> Likely not. However, look at the number of regions assigned to each
> RegionServer. This should be close to even. Adding a new RegionServer to
> the cluster will result in some of those regions migrating from the other
> servers to the new one. The impact will be a decrease in the average number
> of regions served per RegionServer.
>
> Your logical partitioning remains the same whether it's being served by N,
> 2N, or 3.5N+2 RegionServers. Your client always needs to execute that
> bucket-agnostic query as N scans, touching each of the N buckets. Precisely
> which RegionServers are touched by any given scan depends entirely on how
> the balancer has distributed load on your cluster.
>
> Thanks,
> Nick
>
> On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL <jo...@gmail.com> wrote:
>
> > Thanks St.Ack for mentioning about the load-balancer.
> >
> > But my question was two folded:
> > Case-1. If a new RS is added, then the load-balancer will do it's job
> > considering no new region has been created in the meanwhile. // As you've
> > already answered.
> >
> > Case-2. Whether a new RS is added or not, an existing region is splitted
> > into two, then how the new writes will to the new region? Because, lets
> say
> > initially the hash function was calculated with *N* Regions and now there
> > are *N+1* Regions in the cluster.
> >
> > In that case, do I need to change the Hash function and reshuffle all
> the
> > existing data within the cluster !! Or, HBase has some mechanism to
> handle
> > this?
> >
> >
> > Many thanks again for helping me out...
> >
> >
> > 
> > Regards,
> > Joarder Kamal
> >
> > On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:
> >
> > > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com>
> > wrote:
> > >
> > > > May be a simple question to answer for the experienced HBase users
> and
> > > > developers:
> > > >
> > > > If I use hash partitioning to evenly distribute write workloads into
> my
> > > > region servers and later add a new region server to scale or split an
> > > > existing region, then do I need to change my hash function and
> > re-shuffle
> > > > all the existing data in between all the region servers (old and
> new)?
> > > Or,
> > > > is there any better solution for this? Any guidance would be very
> much
> > > > helpful.
> > > >
> > >
> > > You do not need to change your hash function.
> > >
> > > When you add a new regionserver, the balancer will move some of the
> > > existing regions to the new host.
> > >
> > > St.Ack
> > >
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Nick Dimiduk <nd...@gmail.com>.

Hi Joarder,

I think you're slightly confused about the impact of using a hashed (or
sometimes called "salted") prefix for your rowkeys. This strategy for
rowkey design has an impact on the logical ordering of your data, not
necessarily the physical distribution of your data. In HBase, these are
orthogonal concerns. It means that to execute a bucket-agnostic query, the
client must initiate N scans. However, there's no guarantee that all
regions starting with the same hash land on the same RegionServer. Region
assignment is a complex beast; as I understand, it's based on a randomish,
load-based assignment.

Take a look at your existing table distributed on a size-N cluster. Do all
regions that fall within the first bucket sit on the same RegionServer?
Likely not. However, look at the number of regions assigned to each
RegionServer. This should be close to even. Adding a new RegionServer to
the cluster will result in some of those regions migrating from the other
servers to the new one. The impact will be a decrease in the average number
of regions served per RegionServer.

Your logical partitioning remains the same whether it's being served by N,
2N, or 3.5N+2 RegionServers. Your client always needs to execute that
bucket-agnostic query as N scans, touching each of the N buckets. Precisely
which RegionServers are touched by any given scan depends entirely on how
the balancer has distributed load on your cluster.

Thanks,
Nick

On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL <jo...@gmail.com> wrote:

> Thanks St.Ack for mentioning about the load-balancer.
>
> But my question was two folded:
> Case-1. If a new RS is added, then the load-balancer will do it's job
> considering no new region has been created in the meanwhile. // As you've
> already answered.
>
> Case-2. Whether a new RS is added or not, an existing region is splitted
> into two, then how the new writes will to the new region? Because, lets say
> initially the hash function was calculated with *N* Regions and now there
> are *N+1* Regions in the cluster.
>
> In that case, do I need to change the Hash function and reshuffle all the
> existing data within the cluster !! Or, HBase has some mechanism to handle
> this?
>
>
> Many thanks again for helping me out...
>
>
> 
> Regards,
> Joarder Kamal
>
> On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:
>
> > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com>
> wrote:
> >
> > > May be a simple question to answer for the experienced HBase users and
> > > developers:
> > >
> > > If I use hash partitioning to evenly distribute write workloads into my
> > > region servers and later add a new region server to scale or split an
> > > existing region, then do I need to change my hash function and
> re-shuffle
> > > all the existing data in between all the region servers (old and new)?
> > Or,
> > > is there any better solution for this? Any guidance would be very much
> > > helpful.
> > >
> >
> > You do not need to change your hash function.
> >
> > When you add a new regionserver, the balancer will move some of the
> > existing regions to the new host.
> >
> > St.Ack
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Nick Dimiduk <nd...@gmail.com>.

Hi Joarder,

I think you're slightly confused about the impact of using a hashed (or
sometimes called "salted") prefix for your rowkeys. This strategy for
rowkey design has an impact on the logical ordering of your data, not
necessarily the physical distribution of your data. In HBase, these are
orthogonal concerns. It means that to execute a bucket-agnostic query, the
client must initiate N scans. However, there's no guarantee that all
regions starting with the same hash land on the same RegionServer. Region
assignment is a complex beast; as I understand, it's based on a randomish,
load-based assignment.

Take a look at your existing table distributed on a size-N cluster. Do all
regions that fall within the first bucket sit on the same RegionServer?
Likely not. However, look at the number of regions assigned to each
RegionServer. This should be close to even. Adding a new RegionServer to
the cluster will result in some of those regions migrating from the other
servers to the new one. The impact will be a decrease in the average number
of regions served per RegionServer.

Your logical partitioning remains the same whether it's being served by N,
2N, or 3.5N+2 RegionServers. Your client always needs to execute that
bucket-agnostic query as N scans, touching each of the N buckets. Precisely
which RegionServers are touched by any given scan depends entirely on how
the balancer has distributed load on your cluster.

Thanks,
Nick

On Thu, Jun 27, 2013 at 5:02 PM, Joarder KAMAL <jo...@gmail.com> wrote:

> Thanks St.Ack for mentioning about the load-balancer.
>
> But my question was two folded:
> Case-1. If a new RS is added, then the load-balancer will do it's job
> considering no new region has been created in the meanwhile. // As you've
> already answered.
>
> Case-2. Whether a new RS is added or not, an existing region is splitted
> into two, then how the new writes will to the new region? Because, lets say
> initially the hash function was calculated with *N* Regions and now there
> are *N+1* Regions in the cluster.
>
> In that case, do I need to change the Hash function and reshuffle all the
> existing data within the cluster !! Or, HBase has some mechanism to handle
> this?
>
>
> Many thanks again for helping me out...
>
>
> 
> Regards,
> Joarder Kamal
>
> On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:
>
> > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com>
> wrote:
> >
> > > May be a simple question to answer for the experienced HBase users and
> > > developers:
> > >
> > > If I use hash partitioning to evenly distribute write workloads into my
> > > region servers and later add a new region server to scale or split an
> > > existing region, then do I need to change my hash function and
> re-shuffle
> > > all the existing data in between all the region servers (old and new)?
> > Or,
> > > is there any better solution for this? Any guidance would be very much
> > > helpful.
> > >
> >
> > You do not need to change your hash function.
> >
> > When you add a new regionserver, the balancer will move some of the
> > existing regions to the new host.
> >
> > St.Ack
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by ramkrishna vasudevan <ra...@gmail.com>.

I would suggest you could write a custom load balancer and then have your
hashing algo to determine how the load balancing should happen.  Hope this
helps.

Regards
Ram


On Fri, Jun 28, 2013 at 5:32 AM, Joarder KAMAL <jo...@gmail.com> wrote:

> Thanks St.Ack for mentioning about the load-balancer.
>
> But my question was two folded:
> Case-1. If a new RS is added, then the load-balancer will do it's job
> considering no new region has been created in the meanwhile. // As you've
> already answered.
>
> Case-2. Whether a new RS is added or not, an existing region is splitted
> into two, then how the new writes will to the new region? Because, lets say
> initially the hash function was calculated with *N* Regions and now there
> are *N+1* Regions in the cluster.
>
> In that case, do I need to change the Hash function and reshuffle all the
> existing data within the cluster !! Or, HBase has some mechanism to handle
> this?
>
>
> Many thanks again for helping me out...
>
>
> 
> Regards,
> Joarder Kamal
>
> On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:
>
> > On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com>
> wrote:
> >
> > > May be a simple question to answer for the experienced HBase users and
> > > developers:
> > >
> > > If I use hash partitioning to evenly distribute write workloads into my
> > > region servers and later add a new region server to scale or split an
> > > existing region, then do I need to change my hash function and
> re-shuffle
> > > all the existing data in between all the region servers (old and new)?
> > Or,
> > > is there any better solution for this? Any guidance would be very much
> > > helpful.
> > >
> >
> > You do not need to change your hash function.
> >
> > When you add a new regionserver, the balancer will move some of the
> > existing regions to the new host.
> >
> > St.Ack
> >
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Joarder KAMAL <jo...@gmail.com>.

Thanks St.Ack for mentioning about the load-balancer.

But my question was two folded:
Case-1. If a new RS is added, then the load-balancer will do it's job
considering no new region has been created in the meanwhile. // As you've
already answered.

Case-2. Whether a new RS is added or not, an existing region is splitted
into two, then how the new writes will to the new region? Because, lets say
initially the hash function was calculated with *N* Regions and now there
are *N+1* Regions in the cluster.

In that case, do I need to change the Hash function and reshuffle all the
existing data within the cluster !! Or, HBase has some mechanism to handle
this?

Many thanks again for helping me out...

Regards,
Joarder Kamal

On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:

> On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com> wrote:
>
> > May be a simple question to answer for the experienced HBase users and
> > developers:
> >
> > If I use hash partitioning to evenly distribute write workloads into my
> > region servers and later add a new region server to scale or split an
> > existing region, then do I need to change my hash function and re-shuffle
> > all the existing data in between all the region servers (old and new)?
> Or,
> > is there any better solution for this? Any guidance would be very much
> > helpful.
> >
>
> You do not need to change your hash function.
>
> When you add a new regionserver, the balancer will move some of the
> existing regions to the new host.
>
> St.Ack
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Joarder KAMAL <jo...@gmail.com>.

Thanks St.Ack for mentioning about the load-balancer.

But my question was two folded:
Case-1. If a new RS is added, then the load-balancer will do it's job
considering no new region has been created in the meanwhile. // As you've
already answered.

Case-2. Whether a new RS is added or not, an existing region is splitted
into two, then how the new writes will to the new region? Because, lets say
initially the hash function was calculated with *N* Regions and now there
are *N+1* Regions in the cluster.

In that case, do I need to change the Hash function and reshuffle all the
existing data within the cluster !! Or, HBase has some mechanism to handle
this?

Many thanks again for helping me out...

Regards,
Joarder Kamal

On 28 June 2013 02:26, Stack <st...@duboce.net> wrote:

> On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com> wrote:
>
> > May be a simple question to answer for the experienced HBase users and
> > developers:
> >
> > If I use hash partitioning to evenly distribute write workloads into my
> > region servers and later add a new region server to scale or split an
> > existing region, then do I need to change my hash function and re-shuffle
> > all the existing data in between all the region servers (old and new)?
> Or,
> > is there any better solution for this? Any guidance would be very much
> > helpful.
> >
>
> You do not need to change your hash function.
>
> When you add a new regionserver, the balancer will move some of the
> existing regions to the new host.
>
> St.Ack
>

Re: Adding a new region server or splitting an old region in a Hash-partitioned HBase Data Store

Posted by Stack <st...@duboce.net>.

On Wed, Jun 26, 2013 at 4:24 PM, Joarder KAMAL <jo...@gmail.com> wrote:

> May be a simple question to answer for the experienced HBase users and
> developers:
>
> If I use hash partitioning to evenly distribute write workloads into my
> region servers and later add a new region server to scale or split an
> existing region, then do I need to change my hash function and re-shuffle
> all the existing data in between all the region servers (old and new)? Or,
> is there any better solution for this? Any guidance would be very much
> helpful.
>

You do not need to change your hash function.

When you add a new regionserver, the balancer will move some of the
existing regions to the new host.

St.Ack