You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@phoenix.apache.org by James Heather <ja...@mendeley.com> on 2015/09/03 10:32:20 UTC

Rebalancing after adding a new node

Suppose I create a table with a billion rows, on a cluster with N nodes.
Then I want to increase performance, so I add a new node to the cluster.
Obviously the data is still stored on the first N nodes, and not on the new
one. Is there a way of redistributing the data (online) to take advantage
of the new node?

I realise the answer might depend on the configuration of the table. If
there are schemas that fit this notion well, and schemas that don't, I'd be
interested to know about that too.

(This will be running on CDH5, if that makes a difference.)

James

Re: Rebalancing after adding a new node

Posted by Vladimir Rodionov <vl...@gmail.com>.

HBase does that automatically for you. Regions will be redistributed by
HBase balancer and after next major compaction, locality of data will be
restored, but ... HBase balancer works on a global level (all tables) and
can not rebalance only one table, besides this there is a such a separate
beast as HDFS balancer that makes its own decisions and does not care much
about HBase data. It is recommended to disable HDFS balancer in HBase
cluster for this reason.

-Vlad

On Thu, Sep 3, 2015 at 1:32 AM, James Heather <ja...@mendeley.com>
wrote:

> Suppose I create a table with a billion rows, on a cluster with N nodes.
> Then I want to increase performance, so I add a new node to the cluster.
> Obviously the data is still stored on the first N nodes, and not on the new
> one. Is there a way of redistributing the data (online) to take advantage
> of the new node?
>
> I realise the answer might depend on the configuration of the table. If
> there are schemas that fit this notion well, and schemas that don't, I'd be
> interested to know about that too.
>
> (This will be running on CDH5, if that makes a difference.)
>
> James
>