You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Oleg Ruchovets <or...@gmail.com> on 2012/08/29 16:46:05 UTC

bulk loading - region creation/pre-spliting

Hi ,
    I have bulk loading job.
My job is for  User data aggregation.
Before I run Bulk Loading aggregation I want  to create regions
UserID looks like this  :

943e2c6d66d732e06ab257903f240d27


a0617cb2b964690a39b0d93e7fe2f021


ac85b4dee6d8c8495d61201234dfb73e


b8416d5e0fe2a1228f042dffa8d291e2


c422be9e75d28d9afe0f1f98f59cda92


fe6b0ad1822455958586e240eb75c1d7


1790ee2ce4487d976cd9eddd036275d6


344c3de9449a9522d2a4de8bb9e81b02


4fcccd6790aec3056f897741b467d08c


6b67dc1922e4fc0cd6fa31f64bd51ef3


87f1374e7c900a243450f5b5c3a2b2b9


a4180db6a62f300cdecf77310f0010ac



I have ~ 50.000.000 users. I run aggregation on daily basis and per day I
have ~ 30 regions.
So The objective is to create 30 regions with more or less equal
distributions.

The question is : What is the best practice to verify start / end key for
regions in my use case?

Thanks in advance
Oleg.

Re: bulk loading - region creation/pre-spliting

Posted by Adrien Mogenet <ad...@gmail.com>.

If you plan pre-splitting regions, look at the classes exposed by
RegionSplitter (http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/RegionSplitter.html).

Are you keys String representing hexadecimal values or are they really
binary encoded ? (I mean, \xFF\x03 and not "F3" for example)

On Wed, Aug 29, 2012 at 4:46 PM, Oleg Ruchovets <or...@gmail.com> wrote:
> Hi ,
>     I have bulk loading job.
> My job is for  User data aggregation.
> Before I run Bulk Loading aggregation I want  to create regions
> UserID looks like this  :
>
> 943e2c6d66d732e06ab257903f240d27
>
>
> a0617cb2b964690a39b0d93e7fe2f021
>
>
> ac85b4dee6d8c8495d61201234dfb73e
>
>
> b8416d5e0fe2a1228f042dffa8d291e2
>
>
> c422be9e75d28d9afe0f1f98f59cda92
>
>
> fe6b0ad1822455958586e240eb75c1d7
>
>
> 1790ee2ce4487d976cd9eddd036275d6
>
>
> 344c3de9449a9522d2a4de8bb9e81b02
>
>
> 4fcccd6790aec3056f897741b467d08c
>
>
> 6b67dc1922e4fc0cd6fa31f64bd51ef3
>
>
> 87f1374e7c900a243450f5b5c3a2b2b9
>
>
> a4180db6a62f300cdecf77310f0010ac
>
>
>
> I have ~ 50.000.000 users. I run aggregation on daily basis and per day I
> have ~ 30 regions.
> So The objective is to create 30 regions with more or less equal
> distributions.
>
> The question is : What is the best practice to verify start / end key for
> regions in my use case?
>
> Thanks in advance
> Oleg.

-- 
AM