You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Mark <st...@gmail.com> on 2011/11/20 20:33:08 UTC

Region Splits

Say we have a use case that has sequential row keys and we have rows 
0-100. Let's assume that 100 rows = the split size. Now when there is a 
split it will split at the halfway mark so there will be two regions as 
follows:

Region1 [START-49]
Region2 [50-END]

So now at this point all inserts will be writing to Region2 only 
correct? Now at some point Region2 will need to split and it will look 
like the following before the split:

Region1 [START-49]
Region2 [50-150]

After the split it will look like:

Region1 [START-49]
Region2 [50-100]
Region3 [150-END]

And this pattern will continue correct? My question is when there is a 
use case that has sequential keys how would any of the older regions 
every receive anymore writes? It seems like they would always be stuck 
at MaxRegionSize/2. Can someone please confirm or clarify this issue?

Thanks

Re: Region Splits

Posted by Nicolas Spiegelberg <ns...@fb.com>.

The downside of hashing is not that it's unpredictable, but that it's
non-reversible (which is why you need to append the original key).
Reversing should be fine, just make sure that you performa a byte-order
reversal so that you have uniform distribution.

On 11/22/11 7:47 PM, "Mark" <st...@gmail.com> wrote:

>Ok so this would be "short scans"?
>
>In my use case this would be unnecessary so I think Im going to run with
>the reversed id technique. I'm actually surprised I've never heard of
>anyone using this over the non predictable hashing.
>
>On 11/22/11 5:35 PM, Sam Seigal wrote:
>> If you are prefixing your keys with predictable hashes, you can do
>> range scans - i.e. create a scanner for each prefix and then merge
>> results at the client. With unpredictable hashes and key reversals ,
>> this might not be entirely possible.
>>
>> I remember someone on the mailing list mentioning that Mozilla Socorro
>> uses a similar technique. I haven't had a chance to look at their code
>> yet, but that is something you might want to look at.
>>
>> On Tue, Nov 22, 2011 at 5:11 PM, Mark<st...@gmail.com>  wrote:
>>> What to you mean by "short scans"?
>>>
>>> I understand that scans will not be possible with this method but
>>>neither
>>> would they be if I hashed them so it seems like I'm in the same boat
>>>anyway.
>>>
>>> On 11/22/11 5:00 PM, Amandeep Khurana wrote:
>>>> Mark
>>>>
>>>> Key designs depend on expected access patterns and use cases. From a
>>>> theoretical stand point, what you are saying will work to distribute
>>>> writes but if you want to access a small range, you'll need to fan out
>>>> your reads and can't leverage short scans.
>>>>
>>>> Amandeep
>>>>
>>>> On Nov 22, 2011, at 4:55 PM, Mark<st...@gmail.com>    wrote:
>>>>
>>>>> I just thought of something.
>>>>>
>>>>> In cases where the id is sequential couldn't one simply reverse the
>>>>>id to
>>>>> get more of a uniform distribution?
>>>>>
>>>>> 510911 =>    119015
>>>>> 510912 =>    219015
>>>>> 510913 =>    319015
>>>>> 510914 =>    419015
>>>>>
>>>>> That seems like a reasonable alternative that doesn't require
>>>>>prefixing
>>>>> each row key with an extra 16 bytes. Am I wrong in thinking this
>>>>>could work?
>>>>>
>>>>>
>>>>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>>>>> If you increase the region size to 2GB, then all regions (current
>>>>>>and
>>>>>> new)
>>>>>> will avoid a split until their aggregate StoreFile size reaches that
>>>>>> limit.  Reorganizing the regions for a uniform growth pattern is
>>>>>>really
>>>>>> a
>>>>>> schema design problem.  There is the capability to merge two
>>>>>>adjacent
>>>>>> regions if you know that your data growth pattern is non-uniform.
>>>>>> StumbleUpon&     other companies have more experience with those
>>>>>>utilities
>>>>>> than I do.
>>>>>>
>>>>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely
>>>>>>want
>>>>>> to
>>>>>> lean towards increasing the region size.  HFile scalability code is
>>>>>>more
>>>>>> mature/stable than the region splitting code.  Plus, automatic
>>>>>>region
>>>>>> splitting is harder to optimize&     debug when failures occur.
>>>>>>
>>>>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>>>>> <Sr...@mindtree.com>     wrote:
>>>>>>
>>>>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>>>>
>>>>>>> What will happen if we increased the region size, say from current
>>>>>>> value
>>>>>>> of 256 MB to a new value of 2GB?
>>>>>>> Will existing regions continue to use only 256 MB space?
>>>>>>>
>>>>>>> Is there a way to reorganize the regions so that each regions
>>>>>>>grows to
>>>>>>> 2GB size?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Srikanth
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>>>>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>>>>> To: user@hbase.apache.org
>>>>>>> Subject: Re: Region Splits
>>>>>>>
>>>>>>> No.  The purpose of major compactions is to merge&     dedupe
>>>>>>>within a
>>>>>>> region
>>>>>>> boundary.  Compactions will not alter region boundaries, except in
>>>>>>>the
>>>>>>> case of splits where a compaction is necessary to filter out any
>>>>>>>Rows
>>>>>>> from
>>>>>>> the parent region that are no longer applicable to the daughter
>>>>>>>region.
>>>>>>>
>>>>>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>>>>>> <Sr...@mindtree.com>     wrote:
>>>>>>>
>>>>>>>> Will major compactions take care of merging "older" regions or
>>>>>>>>adding
>>>>>>>> more key/values to them as number of regions grow?
>>>>>>>>
>>>>>>>> Regard,
>>>>>>>> Srikanth
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>>>>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>>>>>> To: user@hbase.apache.org
>>>>>>>> Subject: Re: Region Splits
>>>>>>>>
>>>>>>>> Mark,
>>>>>>>>
>>>>>>>> Yes, your understanding is correct. If your keys are sequential
>>>>>>>> (timestamps
>>>>>>>> etc), you will always be writing to the end of the table and
>>>>>>>>"older"
>>>>>>>> regions will not get any writes. This is one of the arguments
>>>>>>>>against
>>>>>>>> using
>>>>>>>> sequential keys.
>>>>>>>>
>>>>>>>> -ak
>>>>>>>>
>>>>>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Say we have a use case that has sequential row keys and we have
>>>>>>>>>rows
>>>>>>>>> 0-100. Let's assume that 100 rows = the split size. Now when
>>>>>>>>>there is
>>>>>>>>> a
>>>>>>>>> split it will split at the halfway mark so there will be two
>>>>>>>>>regions
>>>>>>>>> as
>>>>>>>>> follows:
>>>>>>>>>
>>>>>>>>> Region1 [START-49]
>>>>>>>>> Region2 [50-END]
>>>>>>>>>
>>>>>>>>> So now at this point all inserts will be writing to Region2 only
>>>>>>>>> correct?
>>>>>>>>> Now at some point Region2 will need to split and it will look
>>>>>>>>>like
>>>>>>>>> the
>>>>>>>>> following before the split:
>>>>>>>>>
>>>>>>>>> Region1 [START-49]
>>>>>>>>> Region2 [50-150]
>>>>>>>>>
>>>>>>>>> After the split it will look like:
>>>>>>>>>
>>>>>>>>> Region1 [START-49]
>>>>>>>>> Region2 [50-100]
>>>>>>>>> Region3 [150-END]
>>>>>>>>>
>>>>>>>>> And this pattern will continue correct? My question is when
>>>>>>>>>there is
>>>>>>>>> a
>>>>>>>>> use
>>>>>>>>> case that has sequential keys how would any of the older regions
>>>>>>>>> every
>>>>>>>>> receive anymore writes? It seems like they would always be stuck
>>>>>>>>>at
>>>>>>>>> MaxRegionSize/2. Can someone please confirm or clarify this
>>>>>>>>>issue?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> ________________________________
>>>>>>>>
>>>>>>>> http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Mark <st...@gmail.com>.

Ok so this would be "short scans"?

In my use case this would be unnecessary so I think Im going to run with 
the reversed id technique. I'm actually surprised I've never heard of 
anyone using this over the non predictable hashing.

On 11/22/11 5:35 PM, Sam Seigal wrote:
> If you are prefixing your keys with predictable hashes, you can do
> range scans - i.e. create a scanner for each prefix and then merge
> results at the client. With unpredictable hashes and key reversals ,
> this might not be entirely possible.
>
> I remember someone on the mailing list mentioning that Mozilla Socorro
> uses a similar technique. I haven't had a chance to look at their code
> yet, but that is something you might want to look at.
>
> On Tue, Nov 22, 2011 at 5:11 PM, Mark<st...@gmail.com>  wrote:
>> What to you mean by "short scans"?
>>
>> I understand that scans will not be possible with this method but neither
>> would they be if I hashed them so it seems like I'm in the same boat anyway.
>>
>> On 11/22/11 5:00 PM, Amandeep Khurana wrote:
>>> Mark
>>>
>>> Key designs depend on expected access patterns and use cases. From a
>>> theoretical stand point, what you are saying will work to distribute
>>> writes but if you want to access a small range, you'll need to fan out
>>> your reads and can't leverage short scans.
>>>
>>> Amandeep
>>>
>>> On Nov 22, 2011, at 4:55 PM, Mark<st...@gmail.com>    wrote:
>>>
>>>> I just thought of something.
>>>>
>>>> In cases where the id is sequential couldn't one simply reverse the id to
>>>> get more of a uniform distribution?
>>>>
>>>> 510911 =>    119015
>>>> 510912 =>    219015
>>>> 510913 =>    319015
>>>> 510914 =>    419015
>>>>
>>>> That seems like a reasonable alternative that doesn't require prefixing
>>>> each row key with an extra 16 bytes. Am I wrong in thinking this could work?
>>>>
>>>>
>>>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>>>> If you increase the region size to 2GB, then all regions (current and
>>>>> new)
>>>>> will avoid a split until their aggregate StoreFile size reaches that
>>>>> limit.  Reorganizing the regions for a uniform growth pattern is really
>>>>> a
>>>>> schema design problem.  There is the capability to merge two adjacent
>>>>> regions if you know that your data growth pattern is non-uniform.
>>>>> StumbleUpon&     other companies have more experience with those utilities
>>>>> than I do.
>>>>>
>>>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely want
>>>>> to
>>>>> lean towards increasing the region size.  HFile scalability code is more
>>>>> mature/stable than the region splitting code.  Plus, automatic region
>>>>> splitting is harder to optimize&     debug when failures occur.
>>>>>
>>>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>>>> <Sr...@mindtree.com>     wrote:
>>>>>
>>>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>>>
>>>>>> What will happen if we increased the region size, say from current
>>>>>> value
>>>>>> of 256 MB to a new value of 2GB?
>>>>>> Will existing regions continue to use only 256 MB space?
>>>>>>
>>>>>> Is there a way to reorganize the regions so that each regions grows to
>>>>>> 2GB size?
>>>>>>
>>>>>> Thanks,
>>>>>> Srikanth
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>>>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>>>> To: user@hbase.apache.org
>>>>>> Subject: Re: Region Splits
>>>>>>
>>>>>> No.  The purpose of major compactions is to merge&     dedupe within a
>>>>>> region
>>>>>> boundary.  Compactions will not alter region boundaries, except in the
>>>>>> case of splits where a compaction is necessary to filter out any Rows
>>>>>> from
>>>>>> the parent region that are no longer applicable to the daughter region.
>>>>>>
>>>>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>>>>> <Sr...@mindtree.com>     wrote:
>>>>>>
>>>>>>> Will major compactions take care of merging "older" regions or adding
>>>>>>> more key/values to them as number of regions grow?
>>>>>>>
>>>>>>> Regard,
>>>>>>> Srikanth
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>>>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>>>>> To: user@hbase.apache.org
>>>>>>> Subject: Re: Region Splits
>>>>>>>
>>>>>>> Mark,
>>>>>>>
>>>>>>> Yes, your understanding is correct. If your keys are sequential
>>>>>>> (timestamps
>>>>>>> etc), you will always be writing to the end of the table and "older"
>>>>>>> regions will not get any writes. This is one of the arguments against
>>>>>>> using
>>>>>>> sequential keys.
>>>>>>>
>>>>>>> -ak
>>>>>>>
>>>>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is
>>>>>>>> a
>>>>>>>> split it will split at the halfway mark so there will be two regions
>>>>>>>> as
>>>>>>>> follows:
>>>>>>>>
>>>>>>>> Region1 [START-49]
>>>>>>>> Region2 [50-END]
>>>>>>>>
>>>>>>>> So now at this point all inserts will be writing to Region2 only
>>>>>>>> correct?
>>>>>>>> Now at some point Region2 will need to split and it will look like
>>>>>>>> the
>>>>>>>> following before the split:
>>>>>>>>
>>>>>>>> Region1 [START-49]
>>>>>>>> Region2 [50-150]
>>>>>>>>
>>>>>>>> After the split it will look like:
>>>>>>>>
>>>>>>>> Region1 [START-49]
>>>>>>>> Region2 [50-100]
>>>>>>>> Region3 [150-END]
>>>>>>>>
>>>>>>>> And this pattern will continue correct? My question is when there is
>>>>>>>> a
>>>>>>>> use
>>>>>>>> case that has sequential keys how would any of the older regions
>>>>>>>> every
>>>>>>>> receive anymore writes? It seems like they would always be stuck at
>>>>>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> ________________________________
>>>>>>>
>>>>>>> http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Sam Seigal <se...@yahoo.com>.

If you are prefixing your keys with predictable hashes, you can do
range scans - i.e. create a scanner for each prefix and then merge
results at the client. With unpredictable hashes and key reversals ,
this might not be entirely possible.

I remember someone on the mailing list mentioning that Mozilla Socorro
uses a similar technique. I haven't had a chance to look at their code
yet, but that is something you might want to look at.

On Tue, Nov 22, 2011 at 5:11 PM, Mark <st...@gmail.com> wrote:
> What to you mean by "short scans"?
>
> I understand that scans will not be possible with this method but neither
> would they be if I hashed them so it seems like I'm in the same boat anyway.
>
> On 11/22/11 5:00 PM, Amandeep Khurana wrote:
>>
>> Mark
>>
>> Key designs depend on expected access patterns and use cases. From a
>> theoretical stand point, what you are saying will work to distribute
>> writes but if you want to access a small range, you'll need to fan out
>> your reads and can't leverage short scans.
>>
>> Amandeep
>>
>> On Nov 22, 2011, at 4:55 PM, Mark<st...@gmail.com>  wrote:
>>
>>> I just thought of something.
>>>
>>> In cases where the id is sequential couldn't one simply reverse the id to
>>> get more of a uniform distribution?
>>>
>>> 510911 =>  119015
>>> 510912 =>  219015
>>> 510913 =>  319015
>>> 510914 =>  419015
>>>
>>> That seems like a reasonable alternative that doesn't require prefixing
>>> each row key with an extra 16 bytes. Am I wrong in thinking this could work?
>>>
>>>
>>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>>>
>>>> If you increase the region size to 2GB, then all regions (current and
>>>> new)
>>>> will avoid a split until their aggregate StoreFile size reaches that
>>>> limit.  Reorganizing the regions for a uniform growth pattern is really
>>>> a
>>>> schema design problem.  There is the capability to merge two adjacent
>>>> regions if you know that your data growth pattern is non-uniform.
>>>> StumbleUpon&   other companies have more experience with those utilities
>>>> than I do.
>>>>
>>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely want
>>>> to
>>>> lean towards increasing the region size.  HFile scalability code is more
>>>> mature/stable than the region splitting code.  Plus, automatic region
>>>> splitting is harder to optimize&   debug when failures occur.
>>>>
>>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>>> <Sr...@mindtree.com>   wrote:
>>>>
>>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>>
>>>>> What will happen if we increased the region size, say from current
>>>>> value
>>>>> of 256 MB to a new value of 2GB?
>>>>> Will existing regions continue to use only 256 MB space?
>>>>>
>>>>> Is there a way to reorganize the regions so that each regions grows to
>>>>> 2GB size?
>>>>>
>>>>> Thanks,
>>>>> Srikanth
>>>>>
>>>>> -----Original Message-----
>>>>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: Region Splits
>>>>>
>>>>> No.  The purpose of major compactions is to merge&   dedupe within a
>>>>> region
>>>>> boundary.  Compactions will not alter region boundaries, except in the
>>>>> case of splits where a compaction is necessary to filter out any Rows
>>>>> from
>>>>> the parent region that are no longer applicable to the daughter region.
>>>>>
>>>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>>>> <Sr...@mindtree.com>   wrote:
>>>>>
>>>>>> Will major compactions take care of merging "older" regions or adding
>>>>>> more key/values to them as number of regions grow?
>>>>>>
>>>>>> Regard,
>>>>>> Srikanth
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>>>> To: user@hbase.apache.org
>>>>>> Subject: Re: Region Splits
>>>>>>
>>>>>> Mark,
>>>>>>
>>>>>> Yes, your understanding is correct. If your keys are sequential
>>>>>> (timestamps
>>>>>> etc), you will always be writing to the end of the table and "older"
>>>>>> regions will not get any writes. This is one of the arguments against
>>>>>> using
>>>>>> sequential keys.
>>>>>>
>>>>>> -ak
>>>>>>
>>>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is
>>>>>>> a
>>>>>>> split it will split at the halfway mark so there will be two regions
>>>>>>> as
>>>>>>> follows:
>>>>>>>
>>>>>>> Region1 [START-49]
>>>>>>> Region2 [50-END]
>>>>>>>
>>>>>>> So now at this point all inserts will be writing to Region2 only
>>>>>>> correct?
>>>>>>> Now at some point Region2 will need to split and it will look like
>>>>>>> the
>>>>>>> following before the split:
>>>>>>>
>>>>>>> Region1 [START-49]
>>>>>>> Region2 [50-150]
>>>>>>>
>>>>>>> After the split it will look like:
>>>>>>>
>>>>>>> Region1 [START-49]
>>>>>>> Region2 [50-100]
>>>>>>> Region3 [150-END]
>>>>>>>
>>>>>>> And this pattern will continue correct? My question is when there is
>>>>>>> a
>>>>>>> use
>>>>>>> case that has sequential keys how would any of the older regions
>>>>>>> every
>>>>>>> receive anymore writes? It seems like they would always be stuck at
>>>>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ________________________________
>>>>>>
>>>>>> http://www.mindtree.com/email/disclaimer.html
>

Re: Region Splits

Posted by Mark <st...@gmail.com>.

What to you mean by "short scans"?

I understand that scans will not be possible with this method but 
neither would they be if I hashed them so it seems like I'm in the same 
boat anyway.

On 11/22/11 5:00 PM, Amandeep Khurana wrote:
> Mark
>
> Key designs depend on expected access patterns and use cases. From a
> theoretical stand point, what you are saying will work to distribute
> writes but if you want to access a small range, you'll need to fan out
> your reads and can't leverage short scans.
>
> Amandeep
>
> On Nov 22, 2011, at 4:55 PM, Mark<st...@gmail.com>  wrote:
>
>> I just thought of something.
>>
>> In cases where the id is sequential couldn't one simply reverse the id to get more of a uniform distribution?
>>
>> 510911 =>  119015
>> 510912 =>  219015
>> 510913 =>  319015
>> 510914 =>  419015
>>
>> That seems like a reasonable alternative that doesn't require prefixing each row key with an extra 16 bytes. Am I wrong in thinking this could work?
>>
>>
>> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>>> If you increase the region size to 2GB, then all regions (current and new)
>>> will avoid a split until their aggregate StoreFile size reaches that
>>> limit.  Reorganizing the regions for a uniform growth pattern is really a
>>> schema design problem.  There is the capability to merge two adjacent
>>> regions if you know that your data growth pattern is non-uniform.
>>> StumbleUpon&   other companies have more experience with those utilities
>>> than I do.
>>>
>>> Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
>>> lean towards increasing the region size.  HFile scalability code is more
>>> mature/stable than the region splitting code.  Plus, automatic region
>>> splitting is harder to optimize&   debug when failures occur.
>>>
>>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>>> <Sr...@mindtree.com>   wrote:
>>>
>>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>>
>>>> What will happen if we increased the region size, say from current value
>>>> of 256 MB to a new value of 2GB?
>>>> Will existing regions continue to use only 256 MB space?
>>>>
>>>> Is there a way to reorganize the regions so that each regions grows to
>>>> 2GB size?
>>>>
>>>> Thanks,
>>>> Srikanth
>>>>
>>>> -----Original Message-----
>>>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>>> To: user@hbase.apache.org
>>>> Subject: Re: Region Splits
>>>>
>>>> No.  The purpose of major compactions is to merge&   dedupe within a region
>>>> boundary.  Compactions will not alter region boundaries, except in the
>>>> case of splits where a compaction is necessary to filter out any Rows from
>>>> the parent region that are no longer applicable to the daughter region.
>>>>
>>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>>> <Sr...@mindtree.com>   wrote:
>>>>
>>>>> Will major compactions take care of merging "older" regions or adding
>>>>> more key/values to them as number of regions grow?
>>>>>
>>>>> Regard,
>>>>> Srikanth
>>>>>
>>>>> -----Original Message-----
>>>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>>> To: user@hbase.apache.org
>>>>> Subject: Re: Region Splits
>>>>>
>>>>> Mark,
>>>>>
>>>>> Yes, your understanding is correct. If your keys are sequential
>>>>> (timestamps
>>>>> etc), you will always be writing to the end of the table and "older"
>>>>> regions will not get any writes. This is one of the arguments against
>>>>> using
>>>>> sequential keys.
>>>>>
>>>>> -ak
>>>>>
>>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>   wrote:
>>>>>
>>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>>>>> split it will split at the halfway mark so there will be two regions as
>>>>>> follows:
>>>>>>
>>>>>> Region1 [START-49]
>>>>>> Region2 [50-END]
>>>>>>
>>>>>> So now at this point all inserts will be writing to Region2 only
>>>>>> correct?
>>>>>> Now at some point Region2 will need to split and it will look like the
>>>>>> following before the split:
>>>>>>
>>>>>> Region1 [START-49]
>>>>>> Region2 [50-150]
>>>>>>
>>>>>> After the split it will look like:
>>>>>>
>>>>>> Region1 [START-49]
>>>>>> Region2 [50-100]
>>>>>> Region3 [150-END]
>>>>>>
>>>>>> And this pattern will continue correct? My question is when there is a
>>>>>> use
>>>>>> case that has sequential keys how would any of the older regions every
>>>>>> receive anymore writes? It seems like they would always be stuck at
>>>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> ________________________________
>>>>>
>>>>> http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Amandeep Khurana <am...@gmail.com>.

Mark

Key designs depend on expected access patterns and use cases. From a
theoretical stand point, what you are saying will work to distribute
writes but if you want to access a small range, you'll need to fan out
your reads and can't leverage short scans.

Amandeep

On Nov 22, 2011, at 4:55 PM, Mark <st...@gmail.com> wrote:

> I just thought of something.
>
> In cases where the id is sequential couldn't one simply reverse the id to get more of a uniform distribution?
>
> 510911 => 119015
> 510912 => 219015
> 510913 => 319015
> 510914 => 419015
>
> That seems like a reasonable alternative that doesn't require prefixing each row key with an extra 16 bytes. Am I wrong in thinking this could work?
>
>
> On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
>> If you increase the region size to 2GB, then all regions (current and new)
>> will avoid a split until their aggregate StoreFile size reaches that
>> limit.  Reorganizing the regions for a uniform growth pattern is really a
>> schema design problem.  There is the capability to merge two adjacent
>> regions if you know that your data growth pattern is non-uniform.
>> StumbleUpon&  other companies have more experience with those utilities
>> than I do.
>>
>> Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
>> lean towards increasing the region size.  HFile scalability code is more
>> mature/stable than the region splitting code.  Plus, automatic region
>> splitting is harder to optimize&  debug when failures occur.
>>
>> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
>> <Sr...@mindtree.com>  wrote:
>>
>>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>>
>>> What will happen if we increased the region size, say from current value
>>> of 256 MB to a new value of 2GB?
>>> Will existing regions continue to use only 256 MB space?
>>>
>>> Is there a way to reorganize the regions so that each regions grows to
>>> 2GB size?
>>>
>>> Thanks,
>>> Srikanth
>>>
>>> -----Original Message-----
>>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>>> Sent: Tuesday, November 22, 2011 10:59 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: Region Splits
>>>
>>> No.  The purpose of major compactions is to merge&  dedupe within a region
>>> boundary.  Compactions will not alter region boundaries, except in the
>>> case of splits where a compaction is necessary to filter out any Rows from
>>> the parent region that are no longer applicable to the daughter region.
>>>
>>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>>> <Sr...@mindtree.com>  wrote:
>>>
>>>> Will major compactions take care of merging "older" regions or adding
>>>> more key/values to them as number of regions grow?
>>>>
>>>> Regard,
>>>> Srikanth
>>>>
>>>> -----Original Message-----
>>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>>> Sent: Monday, November 21, 2011 7:25 AM
>>>> To: user@hbase.apache.org
>>>> Subject: Re: Region Splits
>>>>
>>>> Mark,
>>>>
>>>> Yes, your understanding is correct. If your keys are sequential
>>>> (timestamps
>>>> etc), you will always be writing to the end of the table and "older"
>>>> regions will not get any writes. This is one of the arguments against
>>>> using
>>>> sequential keys.
>>>>
>>>> -ak
>>>>
>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>  wrote:
>>>>
>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>>>> split it will split at the halfway mark so there will be two regions as
>>>>> follows:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-END]
>>>>>
>>>>> So now at this point all inserts will be writing to Region2 only
>>>>> correct?
>>>>> Now at some point Region2 will need to split and it will look like the
>>>>> following before the split:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-150]
>>>>>
>>>>> After the split it will look like:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-100]
>>>>> Region3 [150-END]
>>>>>
>>>>> And this pattern will continue correct? My question is when there is a
>>>>> use
>>>>> case that has sequential keys how would any of the older regions every
>>>>> receive anymore writes? It seems like they would always be stuck at
>>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>> ________________________________
>>>>
>>>> http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Mark <st...@gmail.com>.

I just thought of something.

In cases where the id is sequential couldn't one simply reverse the id 
to get more of a uniform distribution?

510911 => 119015
510912 => 219015
510913 => 319015
510914 => 419015

That seems like a reasonable alternative that doesn't require prefixing 
each row key with an extra 16 bytes. Am I wrong in thinking this could work?


On 11/22/11 12:46 PM, Nicolas Spiegelberg wrote:
> If you increase the region size to 2GB, then all regions (current and new)
> will avoid a split until their aggregate StoreFile size reaches that
> limit.  Reorganizing the regions for a uniform growth pattern is really a
> schema design problem.  There is the capability to merge two adjacent
> regions if you know that your data growth pattern is non-uniform.
> StumbleUpon&  other companies have more experience with those utilities
> than I do.
>
> Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
> lean towards increasing the region size.  HFile scalability code is more
> mature/stable than the region splitting code.  Plus, automatic region
> splitting is harder to optimize&  debug when failures occur.
>
> On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
> <Sr...@mindtree.com>  wrote:
>
>> Thanks Nicolas for the clarification.  I had a follow-up query.
>>
>> What will happen if we increased the region size, say from current value
>> of 256 MB to a new value of 2GB?
>> Will existing regions continue to use only 256 MB space?
>>
>> Is there a way to reorganize the regions so that each regions grows to
>> 2GB size?
>>
>> Thanks,
>> Srikanth
>>
>> -----Original Message-----
>> From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>> Sent: Tuesday, November 22, 2011 10:59 PM
>> To: user@hbase.apache.org
>> Subject: Re: Region Splits
>>
>> No.  The purpose of major compactions is to merge&  dedupe within a region
>> boundary.  Compactions will not alter region boundaries, except in the
>> case of splits where a compaction is necessary to filter out any Rows from
>> the parent region that are no longer applicable to the daughter region.
>>
>> On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
>> <Sr...@mindtree.com>  wrote:
>>
>>> Will major compactions take care of merging "older" regions or adding
>>> more key/values to them as number of regions grow?
>>>
>>> Regard,
>>> Srikanth
>>>
>>> -----Original Message-----
>>> From: Amandeep Khurana [mailto:amansk@gmail.com]
>>> Sent: Monday, November 21, 2011 7:25 AM
>>> To: user@hbase.apache.org
>>> Subject: Re: Region Splits
>>>
>>> Mark,
>>>
>>> Yes, your understanding is correct. If your keys are sequential
>>> (timestamps
>>> etc), you will always be writing to the end of the table and "older"
>>> regions will not get any writes. This is one of the arguments against
>>> using
>>> sequential keys.
>>>
>>> -ak
>>>
>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>  wrote:
>>>
>>>> Say we have a use case that has sequential row keys and we have rows
>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>>> split it will split at the halfway mark so there will be two regions as
>>>> follows:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-END]
>>>>
>>>> So now at this point all inserts will be writing to Region2 only
>>>> correct?
>>>> Now at some point Region2 will need to split and it will look like the
>>>> following before the split:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-150]
>>>>
>>>> After the split it will look like:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-100]
>>>> Region3 [150-END]
>>>>
>>>> And this pattern will continue correct? My question is when there is a
>>>> use
>>>> case that has sequential keys how would any of the older regions every
>>>> receive anymore writes? It seems like they would always be stuck at
>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>
>>> ________________________________
>>>
>>> http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Nicolas Spiegelberg <ns...@fb.com>.

If you increase the region size to 2GB, then all regions (current and new)
will avoid a split until their aggregate StoreFile size reaches that
limit.  Reorganizing the regions for a uniform growth pattern is really a
schema design problem.  There is the capability to merge two adjacent
regions if you know that your data growth pattern is non-uniform.
StumbleUpon & other companies have more experience with those utilities
than I do.

Note: With the introduction of HFileV2 in 0.92, you'll definitely want to
lean towards increasing the region size.  HFile scalability code is more
mature/stable than the region splitting code.  Plus, automatic region
splitting is harder to optimize & debug when failures occur.

On 11/22/11 12:20 PM, "Srikanth P. Shreenivas"
<Sr...@mindtree.com> wrote:

>Thanks Nicolas for the clarification.  I had a follow-up query.
>
>What will happen if we increased the region size, say from current value
>of 256 MB to a new value of 2GB?
>Will existing regions continue to use only 256 MB space?
>
>Is there a way to reorganize the regions so that each regions grows to
>2GB size?
>
>Thanks,
>Srikanth
>
>-----Original Message-----
>From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com]
>Sent: Tuesday, November 22, 2011 10:59 PM
>To: user@hbase.apache.org
>Subject: Re: Region Splits
>
>No.  The purpose of major compactions is to merge & dedupe within a region
>boundary.  Compactions will not alter region boundaries, except in the
>case of splits where a compaction is necessary to filter out any Rows from
>the parent region that are no longer applicable to the daughter region.
>
>On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
><Sr...@mindtree.com> wrote:
>
>>Will major compactions take care of merging "older" regions or adding
>>more key/values to them as number of regions grow?
>>
>>Regard,
>>Srikanth
>>
>>-----Original Message-----
>>From: Amandeep Khurana [mailto:amansk@gmail.com]
>>Sent: Monday, November 21, 2011 7:25 AM
>>To: user@hbase.apache.org
>>Subject: Re: Region Splits
>>
>>Mark,
>>
>>Yes, your understanding is correct. If your keys are sequential
>>(timestamps
>>etc), you will always be writing to the end of the table and "older"
>>regions will not get any writes. This is one of the arguments against
>>using
>>sequential keys.
>>
>>-ak
>>
>>On Sun, Nov 20, 2011 at 11:33 AM, Mark <st...@gmail.com> wrote:
>>
>>> Say we have a use case that has sequential row keys and we have rows
>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>> split it will split at the halfway mark so there will be two regions as
>>> follows:
>>>
>>> Region1 [START-49]
>>> Region2 [50-END]
>>>
>>> So now at this point all inserts will be writing to Region2 only
>>>correct?
>>> Now at some point Region2 will need to split and it will look like the
>>> following before the split:
>>>
>>> Region1 [START-49]
>>> Region2 [50-150]
>>>
>>> After the split it will look like:
>>>
>>> Region1 [START-49]
>>> Region2 [50-100]
>>> Region3 [150-END]
>>>
>>> And this pattern will continue correct? My question is when there is a
>>>use
>>> case that has sequential keys how would any of the older regions every
>>> receive anymore writes? It seems like they would always be stuck at
>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>
>>
>>________________________________
>>
>>http://www.mindtree.com/email/disclaimer.html
>

RE: Region Splits

Posted by "Srikanth P. Shreenivas" <Sr...@mindtree.com>.

Thanks Nicolas for the clarification.  I had a follow-up query.

What will happen if we increased the region size, say from current value of 256 MB to a new value of 2GB?
Will existing regions continue to use only 256 MB space?  

Is there a way to reorganize the regions so that each regions grows to 2GB size?

Thanks,
Srikanth

-----Original Message-----
From: Nicolas Spiegelberg [mailto:nspiegelberg@fb.com] 
Sent: Tuesday, November 22, 2011 10:59 PM
To: user@hbase.apache.org
Subject: Re: Region Splits

No.  The purpose of major compactions is to merge & dedupe within a region
boundary.  Compactions will not alter region boundaries, except in the
case of splits where a compaction is necessary to filter out any Rows from
the parent region that are no longer applicable to the daughter region.

On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
<Sr...@mindtree.com> wrote:

>Will major compactions take care of merging "older" regions or adding
>more key/values to them as number of regions grow?
>
>Regard,
>Srikanth
>
>-----Original Message-----
>From: Amandeep Khurana [mailto:amansk@gmail.com]
>Sent: Monday, November 21, 2011 7:25 AM
>To: user@hbase.apache.org
>Subject: Re: Region Splits
>
>Mark,
>
>Yes, your understanding is correct. If your keys are sequential
>(timestamps
>etc), you will always be writing to the end of the table and "older"
>regions will not get any writes. This is one of the arguments against
>using
>sequential keys.
>
>-ak
>
>On Sun, Nov 20, 2011 at 11:33 AM, Mark <st...@gmail.com> wrote:
>
>> Say we have a use case that has sequential row keys and we have rows
>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>> split it will split at the halfway mark so there will be two regions as
>> follows:
>>
>> Region1 [START-49]
>> Region2 [50-END]
>>
>> So now at this point all inserts will be writing to Region2 only
>>correct?
>> Now at some point Region2 will need to split and it will look like the
>> following before the split:
>>
>> Region1 [START-49]
>> Region2 [50-150]
>>
>> After the split it will look like:
>>
>> Region1 [START-49]
>> Region2 [50-100]
>> Region3 [150-END]
>>
>> And this pattern will continue correct? My question is when there is a
>>use
>> case that has sequential keys how would any of the older regions every
>> receive anymore writes? It seems like they would always be stuck at
>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>
>> Thanks
>>
>>
>>
>>
>>
>
>________________________________
>
>http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Nicolas Spiegelberg <ns...@fb.com>.

No.  The purpose of major compactions is to merge & dedupe within a region
boundary.  Compactions will not alter region boundaries, except in the
case of splits where a compaction is necessary to filter out any Rows from
the parent region that are no longer applicable to the daughter region.

On 11/22/11 9:04 AM, "Srikanth P. Shreenivas"
<Sr...@mindtree.com> wrote:

>Will major compactions take care of merging "older" regions or adding
>more key/values to them as number of regions grow?
>
>Regard,
>Srikanth
>
>-----Original Message-----
>From: Amandeep Khurana [mailto:amansk@gmail.com]
>Sent: Monday, November 21, 2011 7:25 AM
>To: user@hbase.apache.org
>Subject: Re: Region Splits
>
>Mark,
>
>Yes, your understanding is correct. If your keys are sequential
>(timestamps
>etc), you will always be writing to the end of the table and "older"
>regions will not get any writes. This is one of the arguments against
>using
>sequential keys.
>
>-ak
>
>On Sun, Nov 20, 2011 at 11:33 AM, Mark <st...@gmail.com> wrote:
>
>> Say we have a use case that has sequential row keys and we have rows
>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>> split it will split at the halfway mark so there will be two regions as
>> follows:
>>
>> Region1 [START-49]
>> Region2 [50-END]
>>
>> So now at this point all inserts will be writing to Region2 only
>>correct?
>> Now at some point Region2 will need to split and it will look like the
>> following before the split:
>>
>> Region1 [START-49]
>> Region2 [50-150]
>>
>> After the split it will look like:
>>
>> Region1 [START-49]
>> Region2 [50-100]
>> Region3 [150-END]
>>
>> And this pattern will continue correct? My question is when there is a
>>use
>> case that has sequential keys how would any of the older regions every
>> receive anymore writes? It seems like they would always be stuck at
>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>
>> Thanks
>>
>>
>>
>>
>>
>
>________________________________
>
>http://www.mindtree.com/email/disclaimer.html

RE: Region Splits

Posted by "Srikanth P. Shreenivas" <Sr...@mindtree.com>.

Will major compactions take care of merging "older" regions or adding more key/values to them as number of regions grow?

Regard,
Srikanth

-----Original Message-----
From: Amandeep Khurana [mailto:amansk@gmail.com]
Sent: Monday, November 21, 2011 7:25 AM
To: user@hbase.apache.org
Subject: Re: Region Splits

Mark,

Yes, your understanding is correct. If your keys are sequential (timestamps
etc), you will always be writing to the end of the table and "older"
regions will not get any writes. This is one of the arguments against using
sequential keys.

-ak

On Sun, Nov 20, 2011 at 11:33 AM, Mark <st...@gmail.com> wrote:

> Say we have a use case that has sequential row keys and we have rows
> 0-100. Let's assume that 100 rows = the split size. Now when there is a
> split it will split at the halfway mark so there will be two regions as
> follows:
>
> Region1 [START-49]
> Region2 [50-END]
>
> So now at this point all inserts will be writing to Region2 only correct?
> Now at some point Region2 will need to split and it will look like the
> following before the split:
>
> Region1 [START-49]
> Region2 [50-150]
>
> After the split it will look like:
>
> Region1 [START-49]
> Region2 [50-100]
> Region3 [150-END]
>
> And this pattern will continue correct? My question is when there is a use
> case that has sequential keys how would any of the older regions every
> receive anymore writes? It seems like they would always be stuck at
> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>
> Thanks
>
>
>
>
>

________________________________

http://www.mindtree.com/email/disclaimer.html

Re: Region Splits

Posted by Mark <st...@gmail.com>.

As far as rowkey length goes, should it be a concern that we are now 
adding 16 bytes to each key? Would it be sufficient to take say the 
first 4 bytes of the MD5 hash?

On 11/21/11 7:55 AM, Mark wrote:
> Damn, I was hoping my understanding was flawed.
>
> In your example I am guessing the addition of old_key suffix is to 
> prevent against any possible collision. Is that correct?
>
> On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
>> Sequential writes are also an argument for pre-splitting and using hash
>> prefixing.  In other words, presplit your table into N regions 
>> instead of
>> the default of 1&  transform your keys into:
>>
>> new_key = md5(old_key) + old_key
>>
>> Using this method your sequential writes under the old_key are now 
>> spread
>> evenly across all regions.  There are some limitations to hash 
>> prefixing,
>> such as non-sequential scans across row boundaries.  However, it's a
>> tradeoff between even distribution&  advanced query options.
>>
>> On 11/20/11 7:54 PM, "Amandeep Khurana"<am...@gmail.com>  wrote:
>>
>>> Mark,
>>>
>>> Yes, your understanding is correct. If your keys are sequential
>>> (timestamps
>>> etc), you will always be writing to the end of the table and "older"
>>> regions will not get any writes. This is one of the arguments against
>>> using
>>> sequential keys.
>>>
>>> -ak
>>>
>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>  
>>> wrote:
>>>
>>>> Say we have a use case that has sequential row keys and we have rows
>>>> 0-100. Let's assume that 100 rows = the split size. Now when there 
>>>> is a
>>>> split it will split at the halfway mark so there will be two 
>>>> regions as
>>>> follows:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-END]
>>>>
>>>> So now at this point all inserts will be writing to Region2 only
>>>> correct?
>>>> Now at some point Region2 will need to split and it will look like the
>>>> following before the split:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-150]
>>>>
>>>> After the split it will look like:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-100]
>>>> Region3 [150-END]
>>>>
>>>> And this pattern will continue correct? My question is when there is a
>>>> use
>>>> case that has sequential keys how would any of the older regions every
>>>> receive anymore writes? It seems like they would always be stuck at
>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>

Re: Region Splits

Posted by Doug Meil <do...@explorysmedical.com>.

Hi there-

The last part of 6.3.2.3 is important: "Expect tradeoffs
         when designing rowkeys".  Some of this stuff you just have to
prototype.

In terms of performance...

http://hbase.apache.org/book.html#keyvalue

.. if you have huge keys, you'll feel it there.

For the MR part, see...

http://hbase.apache.org/book.html#mapreduce.example.readwrite





On 11/21/11 1:46 PM, "Mark" <st...@gmail.com> wrote:

>I was actually referring to the HBase book that states:
>
>
>        6.3.2.3. Rowkey Length
>
>Keep them as short as is reasonable such that they can still be useful
>for required data access
>
>
>Didn't know if adding 16 bytes to each row key would have any
>performance implications.
>
>As a side question having, what would be the best way of converting
>existing sequential keys to a hashed version? M/R script? Is there a
>simple dump?
>
>
>On 11/21/11 8:18 AM, Nicolas Spiegelberg wrote:
>> Mark: you are correct about the old_key suffix.  I'm assuming that
>>you're
>> worried about this because of keyspace size, correct?  The default
>> algorithm for pre-splitting assumes a 32-bit (4 byte) hash prefix, which
>> should be perfectly scalable for all use cases in the near future of
>> computing.  Really, you could get away with an 8-bit hash prefix if your
>> cluster is small&  you plan to auto-split after a certain size.  This is
>> available if you use UniformSplit but will require a little power user
>> investigation.  I don't think anybody deviates from the default, mainly
>> just because current use cases aren't as finicky about the extra
>>overhead.
>>
>> For the medium term, note that HBASE-4218 will also introduce key
>> compression&  further reduce overhead.  This won't be available until 94
>> or so, but you probably won't be worried about an extra 4 bytes until
>> then.  We currently use the HexStringSplit algorithm in production,
>>which
>> is 8-bytes but is human-readable.  With preliminary investigation, we
>> predict an 80%+ compression in our key size (currently ~80 bytes) with
>> HBASE-4218.
>>
>> On 11/21/11 9:55 AM, "Mark"<st...@gmail.com>  wrote:
>>
>>> Damn, I was hoping my understanding was flawed.
>>>
>>> In your example I am guessing the addition of old_key suffix is to
>>> prevent against any possible collision. Is that correct?
>>>
>>> On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
>>>> Sequential writes are also an argument for pre-splitting and using
>>>>hash
>>>> prefixing.  In other words, presplit your table into N regions instead
>>>> of
>>>> the default of 1&   transform your keys into:
>>>>
>>>> new_key = md5(old_key) + old_key
>>>>
>>>> Using this method your sequential writes under the old_key are now
>>>> spread
>>>> evenly across all regions.  There are some limitations to hash
>>>> prefixing,
>>>> such as non-sequential scans across row boundaries.  However, it's a
>>>> tradeoff between even distribution&   advanced query options.
>>>>
>>>> On 11/20/11 7:54 PM, "Amandeep Khurana"<am...@gmail.com>   wrote:
>>>>
>>>>> Mark,
>>>>>
>>>>> Yes, your understanding is correct. If your keys are sequential
>>>>> (timestamps
>>>>> etc), you will always be writing to the end of the table and "older"
>>>>> regions will not get any writes. This is one of the arguments against
>>>>> using
>>>>> sequential keys.
>>>>>
>>>>> -ak
>>>>>
>>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there
>>>>>>is
>>>>>> a
>>>>>> split it will split at the halfway mark so there will be two regions
>>>>>> as
>>>>>> follows:
>>>>>>
>>>>>> Region1 [START-49]
>>>>>> Region2 [50-END]
>>>>>>
>>>>>> So now at this point all inserts will be writing to Region2 only
>>>>>> correct?
>>>>>> Now at some point Region2 will need to split and it will look like
>>>>>>the
>>>>>> following before the split:
>>>>>>
>>>>>> Region1 [START-49]
>>>>>> Region2 [50-150]
>>>>>>
>>>>>> After the split it will look like:
>>>>>>
>>>>>> Region1 [START-49]
>>>>>> Region2 [50-100]
>>>>>> Region3 [150-END]
>>>>>>
>>>>>> And this pattern will continue correct? My question is when there
>>>>>>is a
>>>>>> use
>>>>>> case that has sequential keys how would any of the older regions
>>>>>>every
>>>>>> receive anymore writes? It seems like they would always be stuck at
>>>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>

Re: Region Splits

Posted by Mark <st...@gmail.com>.

I was actually referring to the HBase book that states:


        6.3.2.3. Rowkey Length

Keep them as short as is reasonable such that they can still be useful 
for required data access


Didn't know if adding 16 bytes to each row key would have any 
performance implications.

As a side question having, what would be the best way of converting 
existing sequential keys to a hashed version? M/R script? Is there a 
simple dump?


On 11/21/11 8:18 AM, Nicolas Spiegelberg wrote:
> Mark: you are correct about the old_key suffix.  I'm assuming that you're
> worried about this because of keyspace size, correct?  The default
> algorithm for pre-splitting assumes a 32-bit (4 byte) hash prefix, which
> should be perfectly scalable for all use cases in the near future of
> computing.  Really, you could get away with an 8-bit hash prefix if your
> cluster is small&  you plan to auto-split after a certain size.  This is
> available if you use UniformSplit but will require a little power user
> investigation.  I don't think anybody deviates from the default, mainly
> just because current use cases aren't as finicky about the extra overhead.
>
> For the medium term, note that HBASE-4218 will also introduce key
> compression&  further reduce overhead.  This won't be available until 94
> or so, but you probably won't be worried about an extra 4 bytes until
> then.  We currently use the HexStringSplit algorithm in production, which
> is 8-bytes but is human-readable.  With preliminary investigation, we
> predict an 80%+ compression in our key size (currently ~80 bytes) with
> HBASE-4218.
>
> On 11/21/11 9:55 AM, "Mark"<st...@gmail.com>  wrote:
>
>> Damn, I was hoping my understanding was flawed.
>>
>> In your example I am guessing the addition of old_key suffix is to
>> prevent against any possible collision. Is that correct?
>>
>> On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
>>> Sequential writes are also an argument for pre-splitting and using hash
>>> prefixing.  In other words, presplit your table into N regions instead
>>> of
>>> the default of 1&   transform your keys into:
>>>
>>> new_key = md5(old_key) + old_key
>>>
>>> Using this method your sequential writes under the old_key are now
>>> spread
>>> evenly across all regions.  There are some limitations to hash
>>> prefixing,
>>> such as non-sequential scans across row boundaries.  However, it's a
>>> tradeoff between even distribution&   advanced query options.
>>>
>>> On 11/20/11 7:54 PM, "Amandeep Khurana"<am...@gmail.com>   wrote:
>>>
>>>> Mark,
>>>>
>>>> Yes, your understanding is correct. If your keys are sequential
>>>> (timestamps
>>>> etc), you will always be writing to the end of the table and "older"
>>>> regions will not get any writes. This is one of the arguments against
>>>> using
>>>> sequential keys.
>>>>
>>>> -ak
>>>>
>>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>
>>>> wrote:
>>>>
>>>>> Say we have a use case that has sequential row keys and we have rows
>>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is
>>>>> a
>>>>> split it will split at the halfway mark so there will be two regions
>>>>> as
>>>>> follows:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-END]
>>>>>
>>>>> So now at this point all inserts will be writing to Region2 only
>>>>> correct?
>>>>> Now at some point Region2 will need to split and it will look like the
>>>>> following before the split:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-150]
>>>>>
>>>>> After the split it will look like:
>>>>>
>>>>> Region1 [START-49]
>>>>> Region2 [50-100]
>>>>> Region3 [150-END]
>>>>>
>>>>> And this pattern will continue correct? My question is when there is a
>>>>> use
>>>>> case that has sequential keys how would any of the older regions every
>>>>> receive anymore writes? It seems like they would always be stuck at
>>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>

Re: Region Splits

Posted by Nicolas Spiegelberg <ns...@fb.com>.

Mark: you are correct about the old_key suffix.  I'm assuming that you're
worried about this because of keyspace size, correct?  The default
algorithm for pre-splitting assumes a 32-bit (4 byte) hash prefix, which
should be perfectly scalable for all use cases in the near future of
computing.  Really, you could get away with an 8-bit hash prefix if your
cluster is small & you plan to auto-split after a certain size.  This is
available if you use UniformSplit but will require a little power user
investigation.  I don't think anybody deviates from the default, mainly
just because current use cases aren't as finicky about the extra overhead.

For the medium term, note that HBASE-4218 will also introduce key
compression & further reduce overhead.  This won't be available until 94
or so, but you probably won't be worried about an extra 4 bytes until
then.  We currently use the HexStringSplit algorithm in production, which
is 8-bytes but is human-readable.  With preliminary investigation, we
predict an 80%+ compression in our key size (currently ~80 bytes) with
HBASE-4218.

On 11/21/11 9:55 AM, "Mark" <st...@gmail.com> wrote:

>Damn, I was hoping my understanding was flawed.
>
>In your example I am guessing the addition of old_key suffix is to
>prevent against any possible collision. Is that correct?
>
>On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
>> Sequential writes are also an argument for pre-splitting and using hash
>> prefixing.  In other words, presplit your table into N regions instead
>>of
>> the default of 1&  transform your keys into:
>>
>> new_key = md5(old_key) + old_key
>>
>> Using this method your sequential writes under the old_key are now
>>spread
>> evenly across all regions.  There are some limitations to hash
>>prefixing,
>> such as non-sequential scans across row boundaries.  However, it's a
>> tradeoff between even distribution&  advanced query options.
>>
>> On 11/20/11 7:54 PM, "Amandeep Khurana"<am...@gmail.com>  wrote:
>>
>>> Mark,
>>>
>>> Yes, your understanding is correct. If your keys are sequential
>>> (timestamps
>>> etc), you will always be writing to the end of the table and "older"
>>> regions will not get any writes. This is one of the arguments against
>>> using
>>> sequential keys.
>>>
>>> -ak
>>>
>>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>
>>>wrote:
>>>
>>>> Say we have a use case that has sequential row keys and we have rows
>>>> 0-100. Let's assume that 100 rows = the split size. Now when there is
>>>>a
>>>> split it will split at the halfway mark so there will be two regions
>>>>as
>>>> follows:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-END]
>>>>
>>>> So now at this point all inserts will be writing to Region2 only
>>>> correct?
>>>> Now at some point Region2 will need to split and it will look like the
>>>> following before the split:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-150]
>>>>
>>>> After the split it will look like:
>>>>
>>>> Region1 [START-49]
>>>> Region2 [50-100]
>>>> Region3 [150-END]
>>>>
>>>> And this pattern will continue correct? My question is when there is a
>>>> use
>>>> case that has sequential keys how would any of the older regions every
>>>> receive anymore writes? It seems like they would always be stuck at
>>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>>

Re: Region Splits

Posted by Mark <st...@gmail.com>.

Damn, I was hoping my understanding was flawed.

In your example I am guessing the addition of old_key suffix is to 
prevent against any possible collision. Is that correct?

On 11/20/11 9:39 PM, Nicolas Spiegelberg wrote:
> Sequential writes are also an argument for pre-splitting and using hash
> prefixing.  In other words, presplit your table into N regions instead of
> the default of 1&  transform your keys into:
>
> new_key = md5(old_key) + old_key
>
> Using this method your sequential writes under the old_key are now spread
> evenly across all regions.  There are some limitations to hash prefixing,
> such as non-sequential scans across row boundaries.  However, it's a
> tradeoff between even distribution&  advanced query options.
>
> On 11/20/11 7:54 PM, "Amandeep Khurana"<am...@gmail.com>  wrote:
>
>> Mark,
>>
>> Yes, your understanding is correct. If your keys are sequential
>> (timestamps
>> etc), you will always be writing to the end of the table and "older"
>> regions will not get any writes. This is one of the arguments against
>> using
>> sequential keys.
>>
>> -ak
>>
>> On Sun, Nov 20, 2011 at 11:33 AM, Mark<st...@gmail.com>  wrote:
>>
>>> Say we have a use case that has sequential row keys and we have rows
>>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>>> split it will split at the halfway mark so there will be two regions as
>>> follows:
>>>
>>> Region1 [START-49]
>>> Region2 [50-END]
>>>
>>> So now at this point all inserts will be writing to Region2 only
>>> correct?
>>> Now at some point Region2 will need to split and it will look like the
>>> following before the split:
>>>
>>> Region1 [START-49]
>>> Region2 [50-150]
>>>
>>> After the split it will look like:
>>>
>>> Region1 [START-49]
>>> Region2 [50-100]
>>> Region3 [150-END]
>>>
>>> And this pattern will continue correct? My question is when there is a
>>> use
>>> case that has sequential keys how would any of the older regions every
>>> receive anymore writes? It seems like they would always be stuck at
>>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>>
>>> Thanks
>>>
>>>
>>>
>>>
>>>

Re: Region Splits

Posted by Nicolas Spiegelberg <ns...@fb.com>.

Sequential writes are also an argument for pre-splitting and using hash
prefixing.  In other words, presplit your table into N regions instead of
the default of 1 & transform your keys into:

new_key = md5(old_key) + old_key

Using this method your sequential writes under the old_key are now spread
evenly across all regions.  There are some limitations to hash prefixing,
such as non-sequential scans across row boundaries.  However, it's a
tradeoff between even distribution & advanced query options.

On 11/20/11 7:54 PM, "Amandeep Khurana" <am...@gmail.com> wrote:

>Mark,
>
>Yes, your understanding is correct. If your keys are sequential
>(timestamps
>etc), you will always be writing to the end of the table and "older"
>regions will not get any writes. This is one of the arguments against
>using
>sequential keys.
>
>-ak
>
>On Sun, Nov 20, 2011 at 11:33 AM, Mark <st...@gmail.com> wrote:
>
>> Say we have a use case that has sequential row keys and we have rows
>> 0-100. Let's assume that 100 rows = the split size. Now when there is a
>> split it will split at the halfway mark so there will be two regions as
>> follows:
>>
>> Region1 [START-49]
>> Region2 [50-END]
>>
>> So now at this point all inserts will be writing to Region2 only
>>correct?
>> Now at some point Region2 will need to split and it will look like the
>> following before the split:
>>
>> Region1 [START-49]
>> Region2 [50-150]
>>
>> After the split it will look like:
>>
>> Region1 [START-49]
>> Region2 [50-100]
>> Region3 [150-END]
>>
>> And this pattern will continue correct? My question is when there is a
>>use
>> case that has sequential keys how would any of the older regions every
>> receive anymore writes? It seems like they would always be stuck at
>> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>>
>> Thanks
>>
>>
>>
>>
>>

Re: Region Splits

Posted by Amandeep Khurana <am...@gmail.com>.

Mark,

Yes, your understanding is correct. If your keys are sequential (timestamps
etc), you will always be writing to the end of the table and "older"
regions will not get any writes. This is one of the arguments against using
sequential keys.

-ak

On Sun, Nov 20, 2011 at 11:33 AM, Mark <st...@gmail.com> wrote:

> Say we have a use case that has sequential row keys and we have rows
> 0-100. Let's assume that 100 rows = the split size. Now when there is a
> split it will split at the halfway mark so there will be two regions as
> follows:
>
> Region1 [START-49]
> Region2 [50-END]
>
> So now at this point all inserts will be writing to Region2 only correct?
> Now at some point Region2 will need to split and it will look like the
> following before the split:
>
> Region1 [START-49]
> Region2 [50-150]
>
> After the split it will look like:
>
> Region1 [START-49]
> Region2 [50-100]
> Region3 [150-END]
>
> And this pattern will continue correct? My question is when there is a use
> case that has sequential keys how would any of the older regions every
> receive anymore writes? It seems like they would always be stuck at
> MaxRegionSize/2. Can someone please confirm or clarify this issue?
>
> Thanks
>
>
>
>
>