You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Felix Sprick <fs...@gmail.com> on 2011/05/04 09:54:09 UTC

hbase data distribution

Hi,

What I want to achieve is that my hbase clients are using all machines
in the hbase cluster when writing data concurrently. How should I
design the rowkey and what other settings do I have to configure to
achieve that all machines in the cluster are addressed and not all
writes end up on the same regionserver? I have a test setup with 10
clients and 4 regionserver, so I would like to see all 4 regionservers
used when the 10 clients write in parallel data into hbase.

thanks,
Felix

Re: hbase data distribution

Posted by Stack <st...@duboce.net>.

Make sure you have enough as many regions as you have servers when you
start loading.  See
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#createTable(org.apache.hadoop.hbase.HTableDescriptor,
byte[], byte[], int) and its adjacent methods in the API.   If you
choose where the region boundaries are carefully then you should get
even loading from the get go.  Otherwise, you'll have to wait a while
until you have put up enough data for HBase balancing to have an
effect  You can hand-split regions and move manually in the shell
during the load startup if you want to bring on the balance ahead of
the automated balance (it runs by default every 5 minutes -- or again,
from the shell you can force a balance to run).

St.Ack

On Wed, May 4, 2011 at 12:54 AM, Felix Sprick <fs...@gmail.com> wrote:
> Hi,
>
> What I want to achieve is that my hbase clients are using all machines
> in the hbase cluster when writing data concurrently. How should I
> design the rowkey and what other settings do I have to configure to
> achieve that all machines in the cluster are addressed and not all
> writes end up on the same regionserver? I have a test setup with 10
> clients and 4 regionserver, so I would like to see all 4 regionservers
> used when the 10 clients write in parallel data into hbase.
>
> thanks,
> Felix
>