You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Matt Wheeler <ma...@explorysmedical.com> on 2011/02/15 19:52:40 UTC

createTable with specified region splits: works great

Pre-creating regions using the byte[][] overload of createTable more or less doubled the performance of our main index table generation.  Our keys start with hashes of the original record IDs, so the data can be evenly distributed between all regions.  The keys are ASCII strings starting with the hash value in hexadecimal, so we specify split keys as zero-padded ASCII strings with equal length.

We try to select an initial region count that will avoid any region splits during the index MR job, without making the table larger than it needs to be.  Performance suffered when we created the table with about 3 times more regions than necessary.

- matt

Re: createTable with specified region splits: works great

Posted by Jean-Daniel Cryans <jd...@apache.org>.
That's a great report Matt, thanks for sharing!

J-D

On Tue, Feb 15, 2011 at 10:52 AM, Matt Wheeler
<ma...@explorysmedical.com> wrote:
> Pre-creating regions using the byte[][] overload of createTable more or less doubled the performance of our main index table generation.  Our keys start with hashes of the original record IDs, so the data can be evenly distributed between all regions.  The keys are ASCII strings starting with the hash value in hexadecimal, so we specify split keys as zero-padded ASCII strings with equal length.
>
> We try to select an initial region count that will avoid any region splits during the index MR job, without making the table larger than it needs to be.  Performance suffered when we created the table with about 3 times more regions than necessary.
>
> - matt
>