You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Hari Krishna <ha...@gmail.com> on 2013/12/27 07:02:15 UTC

Pre splitting the HBase Table for specific row key design

Hi,

We are planning to migrate form CDH3 cluster to CDH4 cluster and as part of
migration we are also planning to use HBase instead of Hive ware house that
we are using in CDH3 cluster. Daily we are bringing the data from oracle to
hadoop using sqooping and we are having 10 different data base schema from
where we are bringing.

In hive ware house we have maintained a table with schema name as higher
level partition and date as other partition in side schema partition. Every
day the  data for the table will be kept on date partition.

In HBase we have designed a table to have a row key as combination of (byte
array value of Bucket Number(value ranges from 0 to 15, so total of 16
buckets we are maintaining), MD5(of schema), MD5(date), byte array value of
pkid). It is working as expected, we are able to retrieve the data based on
schema and date wise, which is our key use case. Here each bucket having a
key of ranges 0 to long max.

Now we are having a challenge in pre-splitting the table (lets say table
name as transactions). Can any one help me on this.

Regards,
GHK.

Re: Pre splitting the HBase Table for specific row key design

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.

Hi Hari,

Can you please provide more details on the the challenge that you are
facing?

You can pre-split using the Java Client Api, the HBase shell or even with
the WebUI.

For the shell, you can do something like this: create 'transactions', 'f1',
{NUMREGIONS => 15, SPLITALGO => 'HexStringSplit'}


JM


2013/12/27 Hari Krishna <ha...@gmail.com>

> Hi,
>
> We are planning to migrate form CDH3 cluster to CDH4 cluster and as part of
> migration we are also planning to use HBase instead of Hive ware house that
> we are using in CDH3 cluster. Daily we are bringing the data from oracle to
> hadoop using sqooping and we are having 10 different data base schema from
> where we are bringing.
>
> In hive ware house we have maintained a table with schema name as higher
> level partition and date as other partition in side schema partition. Every
> day the  data for the table will be kept on date partition.
>
> In HBase we have designed a table to have a row key as combination of (byte
> array value of Bucket Number(value ranges from 0 to 15, so total of 16
> buckets we are maintaining), MD5(of schema), MD5(date), byte array value of
> pkid). It is working as expected, we are able to retrieve the data based on
> schema and date wise, which is our key use case. Here each bucket having a
> key of ranges 0 to long max.
>
> Now we are having a challenge in pre-splitting the table (lets say table
> name as transactions). Can any one help me on this.
>
> Regards,
> GHK.
>