You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shushant Arora <sh...@gmail.com> on 2015/05/22 06:57:24 UTC

avoiding hot spot for timestamp prefix key

Can I avoid hotspot of region with custom region split policy in hbase
>0.96 .

Key is of the form timestamp#guid.
So can I have custom region split policy and use second part of key (i.e)
guid as region split criteria and avoid hot spot??

Re: avoiding hot spot for timestamp prefix key

Posted by Vladimir Rodionov <vl...@gmail.com>.
RegionSplitPolicy only allows you to customize split point (row key). All
rows above this split point will go to
the first daughter region, below - to the second.

The answer on original question is - No, you can not have your custom
policy based on a second part of a key.

-Vlad

On Fri, May 22, 2015 at 2:43 AM, Michael Segel <mi...@hotmail.com>
wrote:

> This is why I created HBASE-12853.
>
> So you don’t have to specify a custom split policy.
>
> Of course the simple solutions are often passed over because of NIH.  ;-)
>
> To be blunt… You encapsulate the bucketing code so that you have a single
> API in to HBase regardless of the type of storage underneath.
> KISS is maintained and you stop people from attempting to do stupid
> things.   (cc’ing dev@hbase) As a product owner, (read PMC / committers)
> you want to keep people from mucking about in the internals.  While its
> true that its open source, and you will have some who want to muck around,
> you also have to consider the corporate users who need something that is
> reliable and less customized so that its supportable.  This is the vendor’s
> dilemma. (hint Cloudera , Horton, IBM, MapR)  You’re selling support to
> HBase and if a customer starts to overload internals with their own code,
> good luck in supporting it.  This is why you do things like 12853 because
> it makes your life easier.
>
> This isn’t a sexy solution. Its core engineering work.
>
> HTH
>
> -Mike
>
> > On May 22, 2015, at 4:22 AM, Shushant Arora <sh...@gmail.com>
> wrote:
> >
> > since custom split policy is based on second part i.e guid so key with
> > first part as 2015-05-22 00:01:02 will be in which region how will that
> be
> > identified?
> >
> >
> > On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> The custom split policy needs to respect the fact that timestamp is the
> >> leading part of the rowkey.
> >>
> >> This would avoid the overlap you mentioned.
> >>
> >> Cheers
> >>
> >>
> >>
> >>> On May 21, 2015, at 11:55 PM, Shushant Arora <
> shushantarora09@gmail.com>
> >> wrote:
> >>>
> >>> guid change with every key, patterns is
> >>> 2015-05-22 00:02:01#AB12EC77778888945
> >>> 2015-05-22 00:02:02#CD9870001234AB457
> >>>
> >>> When we specify custom split algorithm , it may happen that keys of
> same
> >>> sorting order range say (1-7) lies in region R1 as well as in region
> R2?
> >>> Then how .META. table will make further lookups at read time,  say I
> >> search
> >>> for key 3, then will it search in both the regions R1 and R2 ?
> >>>
> >>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>>
> >>>> Does guid change with every key ?
> >>>>
> >>>> bq. use second part of key
> >>>>
> >>>> I don't think so. Suppose first row in the parent region is
> >>>> '1432104178817#321'. After split, the first row in first daughter
> region
> >>>> would still be '1432104178817#321'. Right ?
> >>>>
> >>>> Cheers
> >>>>
> >>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
> >> shushantarora09@gmail.com
> >>>> wrote:
> >>>>
> >>>>> Can I avoid hotspot of region with custom region split policy in
> hbase
> >>>>>> 0.96 .
> >>>>>
> >>>>> Key is of the form timestamp#guid.
> >>>>> So can I have custom region split policy and use second part of key
> >> (i.e)
> >>>>> guid as region split criteria and avoid hot spot??
> >>>>
> >>
>
>

Re: avoiding hot spot for timestamp prefix key

Posted by Vladimir Rodionov <vl...@gmail.com>.
RegionSplitPolicy only allows you to customize split point (row key). All
rows above this split point will go to
the first daughter region, below - to the second.

The answer on original question is - No, you can not have your custom
policy based on a second part of a key.

-Vlad

On Fri, May 22, 2015 at 2:43 AM, Michael Segel <mi...@hotmail.com>
wrote:

> This is why I created HBASE-12853.
>
> So you don’t have to specify a custom split policy.
>
> Of course the simple solutions are often passed over because of NIH.  ;-)
>
> To be blunt… You encapsulate the bucketing code so that you have a single
> API in to HBase regardless of the type of storage underneath.
> KISS is maintained and you stop people from attempting to do stupid
> things.   (cc’ing dev@hbase) As a product owner, (read PMC / committers)
> you want to keep people from mucking about in the internals.  While its
> true that its open source, and you will have some who want to muck around,
> you also have to consider the corporate users who need something that is
> reliable and less customized so that its supportable.  This is the vendor’s
> dilemma. (hint Cloudera , Horton, IBM, MapR)  You’re selling support to
> HBase and if a customer starts to overload internals with their own code,
> good luck in supporting it.  This is why you do things like 12853 because
> it makes your life easier.
>
> This isn’t a sexy solution. Its core engineering work.
>
> HTH
>
> -Mike
>
> > On May 22, 2015, at 4:22 AM, Shushant Arora <sh...@gmail.com>
> wrote:
> >
> > since custom split policy is based on second part i.e guid so key with
> > first part as 2015-05-22 00:01:02 will be in which region how will that
> be
> > identified?
> >
> >
> > On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> The custom split policy needs to respect the fact that timestamp is the
> >> leading part of the rowkey.
> >>
> >> This would avoid the overlap you mentioned.
> >>
> >> Cheers
> >>
> >>
> >>
> >>> On May 21, 2015, at 11:55 PM, Shushant Arora <
> shushantarora09@gmail.com>
> >> wrote:
> >>>
> >>> guid change with every key, patterns is
> >>> 2015-05-22 00:02:01#AB12EC77778888945
> >>> 2015-05-22 00:02:02#CD9870001234AB457
> >>>
> >>> When we specify custom split algorithm , it may happen that keys of
> same
> >>> sorting order range say (1-7) lies in region R1 as well as in region
> R2?
> >>> Then how .META. table will make further lookups at read time,  say I
> >> search
> >>> for key 3, then will it search in both the regions R1 and R2 ?
> >>>
> >>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:
> >>>>
> >>>> Does guid change with every key ?
> >>>>
> >>>> bq. use second part of key
> >>>>
> >>>> I don't think so. Suppose first row in the parent region is
> >>>> '1432104178817#321'. After split, the first row in first daughter
> region
> >>>> would still be '1432104178817#321'. Right ?
> >>>>
> >>>> Cheers
> >>>>
> >>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
> >> shushantarora09@gmail.com
> >>>> wrote:
> >>>>
> >>>>> Can I avoid hotspot of region with custom region split policy in
> hbase
> >>>>>> 0.96 .
> >>>>>
> >>>>> Key is of the form timestamp#guid.
> >>>>> So can I have custom region split policy and use second part of key
> >> (i.e)
> >>>>> guid as region split criteria and avoid hot spot??
> >>>>
> >>
>
>

Re: avoiding hot spot for timestamp prefix key

Posted by Michael Segel <mi...@hotmail.com>.
This is why I created HBASE-12853. 

So you don’t have to specify a custom split policy. 

Of course the simple solutions are often passed over because of NIH.  ;-) 

To be blunt… You encapsulate the bucketing code so that you have a single API in to HBase regardless of the type of storage underneath. 
KISS is maintained and you stop people from attempting to do stupid things.   (cc’ing dev@hbase) As a product owner, (read PMC / committers) you want to keep people from mucking about in the internals.  While its true that its open source, and you will have some who want to muck around, you also have to consider the corporate users who need something that is reliable and less customized so that its supportable.  This is the vendor’s dilemma. (hint Cloudera , Horton, IBM, MapR)  You’re selling support to HBase and if a customer starts to overload internals with their own code, good luck in supporting it.  This is why you do things like 12853 because it makes your life easier. 

This isn’t a sexy solution. Its core engineering work. 

HTH

-Mike

> On May 22, 2015, at 4:22 AM, Shushant Arora <sh...@gmail.com> wrote:
> 
> since custom split policy is based on second part i.e guid so key with
> first part as 2015-05-22 00:01:02 will be in which region how will that be
> identified?
> 
> 
> On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> The custom split policy needs to respect the fact that timestamp is the
>> leading part of the rowkey.
>> 
>> This would avoid the overlap you mentioned.
>> 
>> Cheers
>> 
>> 
>> 
>>> On May 21, 2015, at 11:55 PM, Shushant Arora <sh...@gmail.com>
>> wrote:
>>> 
>>> guid change with every key, patterns is
>>> 2015-05-22 00:02:01#AB12EC77778888945
>>> 2015-05-22 00:02:02#CD9870001234AB457
>>> 
>>> When we specify custom split algorithm , it may happen that keys of same
>>> sorting order range say (1-7) lies in region R1 as well as in region R2?
>>> Then how .META. table will make further lookups at read time,  say I
>> search
>>> for key 3, then will it search in both the regions R1 and R2 ?
>>> 
>>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> 
>>>> Does guid change with every key ?
>>>> 
>>>> bq. use second part of key
>>>> 
>>>> I don't think so. Suppose first row in the parent region is
>>>> '1432104178817#321'. After split, the first row in first daughter region
>>>> would still be '1432104178817#321'. Right ?
>>>> 
>>>> Cheers
>>>> 
>>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
>> shushantarora09@gmail.com
>>>> wrote:
>>>> 
>>>>> Can I avoid hotspot of region with custom region split policy in hbase
>>>>>> 0.96 .
>>>>> 
>>>>> Key is of the form timestamp#guid.
>>>>> So can I have custom region split policy and use second part of key
>> (i.e)
>>>>> guid as region split criteria and avoid hot spot??
>>>> 
>> 


Re: avoiding hot spot for timestamp prefix key

Posted by Michael Segel <mi...@hotmail.com>.
This is why I created HBASE-12853. 

So you don’t have to specify a custom split policy. 

Of course the simple solutions are often passed over because of NIH.  ;-) 

To be blunt… You encapsulate the bucketing code so that you have a single API in to HBase regardless of the type of storage underneath. 
KISS is maintained and you stop people from attempting to do stupid things.   (cc’ing dev@hbase) As a product owner, (read PMC / committers) you want to keep people from mucking about in the internals.  While its true that its open source, and you will have some who want to muck around, you also have to consider the corporate users who need something that is reliable and less customized so that its supportable.  This is the vendor’s dilemma. (hint Cloudera , Horton, IBM, MapR)  You’re selling support to HBase and if a customer starts to overload internals with their own code, good luck in supporting it.  This is why you do things like 12853 because it makes your life easier. 

This isn’t a sexy solution. Its core engineering work. 

HTH

-Mike

> On May 22, 2015, at 4:22 AM, Shushant Arora <sh...@gmail.com> wrote:
> 
> since custom split policy is based on second part i.e guid so key with
> first part as 2015-05-22 00:01:02 will be in which region how will that be
> identified?
> 
> 
> On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> The custom split policy needs to respect the fact that timestamp is the
>> leading part of the rowkey.
>> 
>> This would avoid the overlap you mentioned.
>> 
>> Cheers
>> 
>> 
>> 
>>> On May 21, 2015, at 11:55 PM, Shushant Arora <sh...@gmail.com>
>> wrote:
>>> 
>>> guid change with every key, patterns is
>>> 2015-05-22 00:02:01#AB12EC77778888945
>>> 2015-05-22 00:02:02#CD9870001234AB457
>>> 
>>> When we specify custom split algorithm , it may happen that keys of same
>>> sorting order range say (1-7) lies in region R1 as well as in region R2?
>>> Then how .META. table will make further lookups at read time,  say I
>> search
>>> for key 3, then will it search in both the regions R1 and R2 ?
>>> 
>>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:
>>>> 
>>>> Does guid change with every key ?
>>>> 
>>>> bq. use second part of key
>>>> 
>>>> I don't think so. Suppose first row in the parent region is
>>>> '1432104178817#321'. After split, the first row in first daughter region
>>>> would still be '1432104178817#321'. Right ?
>>>> 
>>>> Cheers
>>>> 
>>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
>> shushantarora09@gmail.com
>>>> wrote:
>>>> 
>>>>> Can I avoid hotspot of region with custom region split policy in hbase
>>>>>> 0.96 .
>>>>> 
>>>>> Key is of the form timestamp#guid.
>>>>> So can I have custom region split policy and use second part of key
>> (i.e)
>>>>> guid as region split criteria and avoid hot spot??
>>>> 
>> 


Re: avoiding hot spot for timestamp prefix key

Posted by Shushant Arora <sh...@gmail.com>.
since custom split policy is based on second part i.e guid so key with
first part as 2015-05-22 00:01:02 will be in which region how will that be
identified?


On Fri, May 22, 2015 at 1:12 PM, Ted Yu <yu...@gmail.com> wrote:

> The custom split policy needs to respect the fact that timestamp is the
> leading part of the rowkey.
>
> This would avoid the overlap you mentioned.
>
> Cheers
>
>
>
> > On May 21, 2015, at 11:55 PM, Shushant Arora <sh...@gmail.com>
> wrote:
> >
> > guid change with every key, patterns is
> > 2015-05-22 00:02:01#AB12EC77778888945
> > 2015-05-22 00:02:02#CD9870001234AB457
> >
> > When we specify custom split algorithm , it may happen that keys of same
> > sorting order range say (1-7) lies in region R1 as well as in region R2?
> > Then how .META. table will make further lookups at read time,  say I
> search
> > for key 3, then will it search in both the regions R1 and R2 ?
> >
> >> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:
> >>
> >> Does guid change with every key ?
> >>
> >> bq. use second part of key
> >>
> >> I don't think so. Suppose first row in the parent region is
> >> '1432104178817#321'. After split, the first row in first daughter region
> >> would still be '1432104178817#321'. Right ?
> >>
> >> Cheers
> >>
> >> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <
> shushantarora09@gmail.com
> >> wrote:
> >>
> >>> Can I avoid hotspot of region with custom region split policy in hbase
> >>>> 0.96 .
> >>>
> >>> Key is of the form timestamp#guid.
> >>> So can I have custom region split policy and use second part of key
> (i.e)
> >>> guid as region split criteria and avoid hot spot??
> >>
>

Re: avoiding hot spot for timestamp prefix key

Posted by Ted Yu <yu...@gmail.com>.
The custom split policy needs to respect the fact that timestamp is the leading part of the rowkey. 

This would avoid the overlap you mentioned. 

Cheers



> On May 21, 2015, at 11:55 PM, Shushant Arora <sh...@gmail.com> wrote:
> 
> guid change with every key, patterns is
> 2015-05-22 00:02:01#AB12EC77778888945
> 2015-05-22 00:02:02#CD9870001234AB457
> 
> When we specify custom split algorithm , it may happen that keys of same
> sorting order range say (1-7) lies in region R1 as well as in region R2?
> Then how .META. table will make further lookups at read time,  say I search
> for key 3, then will it search in both the regions R1 and R2 ?
> 
>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:
>> 
>> Does guid change with every key ?
>> 
>> bq. use second part of key
>> 
>> I don't think so. Suppose first row in the parent region is
>> '1432104178817#321'. After split, the first row in first daughter region
>> would still be '1432104178817#321'. Right ?
>> 
>> Cheers
>> 
>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <shushantarora09@gmail.com
>> wrote:
>> 
>>> Can I avoid hotspot of region with custom region split policy in hbase
>>>> 0.96 .
>>> 
>>> Key is of the form timestamp#guid.
>>> So can I have custom region split policy and use second part of key (i.e)
>>> guid as region split criteria and avoid hot spot??
>> 

Re: avoiding hot spot for timestamp prefix key

Posted by Shushant Arora <sh...@gmail.com>.
guid change with every key, patterns is
2015-05-22 00:02:01#AB12EC77778888945
2015-05-22 00:02:02#CD9870001234AB457

When we specify custom split algorithm , it may happen that keys of same
sorting order range say (1-7) lies in region R1 as well as in region R2?
Then how .META. table will make further lookups at read time,  say I search
for key 3, then will it search in both the regions R1 and R2 ?

On Fri, May 22, 2015 at 10:48 AM, Ted Yu <yu...@gmail.com> wrote:

> Does guid change with every key ?
>
> bq. use second part of key
>
> I don't think so. Suppose first row in the parent region is
> '1432104178817#321'. After split, the first row in first daughter region
> would still be '1432104178817#321'. Right ?
>
> Cheers
>
> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <shushantarora09@gmail.com
> >
> wrote:
>
> > Can I avoid hotspot of region with custom region split policy in hbase
> > >0.96 .
> >
> > Key is of the form timestamp#guid.
> > So can I have custom region split policy and use second part of key (i.e)
> > guid as region split criteria and avoid hot spot??
> >
>

Re: avoiding hot spot for timestamp prefix key

Posted by Ted Yu <yu...@gmail.com>.
Does guid change with every key ?

bq. use second part of key

I don't think so. Suppose first row in the parent region is
'1432104178817#321'. After split, the first row in first daughter region
would still be '1432104178817#321'. Right ?

Cheers

On Thu, May 21, 2015 at 9:57 PM, Shushant Arora <sh...@gmail.com>
wrote:

> Can I avoid hotspot of region with custom region split policy in hbase
> >0.96 .
>
> Key is of the form timestamp#guid.
> So can I have custom region split policy and use second part of key (i.e)
> guid as region split criteria and avoid hot spot??
>