You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Lior Schachter <li...@infolinks.com> on 2011/03/20 18:06:35 UTC
hash function per table
Hi,
What is the API or configuration for changing the default hash function for
a specific htable.
thanks,
Lior
Re: hash function per table
Posted by Chris Tarnas <cf...@email.com>.
This question fairly common on the list, for example:
http://search-hadoop.com/m/jusKg172GBC/timestamp+hash++key/v=threaded
-chris
On Mar 20, 2011, at 12:16 PM, Niels Nuyttens wrote:
> Hi guys,
>
> this is an interesting discussion, please excuse me for hijacking it and
> posing an examplatory problem:
>
> suppose one is getting data from monitoring devices. A composite key
> could be made using <date>_<monitoring_type>. Would this lead to
> hotspots? Could hashing then solve this problem, and won't I lose the
> advantage of being able to list my monitoring data chronologically?
>
> Thanks in advance,
>
> Niels
>
>
> On Sun, 2011-03-20 at 11:57 -0700, Chris Tarnas wrote:
>> There is none - HBase uses a total order partitioner. The straight key value itself determines which region a row is put into. This allows for very rapid scans of sequential data, among other things but does mean it is easier to hotspot regions. Key design is very important.
>>
>> -chris
>>
>> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>>
>>> the hash function that distributes the rows between the regions.
>>>
>>> On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
>>>
>>>> Hash? Which hash are you referring to sir?
>>>> St.Ack
>>>>
>>>> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <li...@infolinks.com>
>>>> wrote:
>>>>> Hi,
>>>>> What is the API or configuration for changing the default hash function
>>>> for
>>>>> a specific htable.
>>>>>
>>>>> thanks,
>>>>> Lior
>>>>>
>>>>
>>
>
>
Re: hash function per table
Posted by Niels Nuyttens <ni...@gmail.com>.
Hi guys,
this is an interesting discussion, please excuse me for hijacking it and
posing an examplatory problem:
suppose one is getting data from monitoring devices. A composite key
could be made using <date>_<monitoring_type>. Would this lead to
hotspots? Could hashing then solve this problem, and won't I lose the
advantage of being able to list my monitoring data chronologically?
Thanks in advance,
Niels
On Sun, 2011-03-20 at 11:57 -0700, Chris Tarnas wrote:
> There is none - HBase uses a total order partitioner. The straight key value itself determines which region a row is put into. This allows for very rapid scans of sequential data, among other things but does mean it is easier to hotspot regions. Key design is very important.
>
> -chris
>
> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>
> > the hash function that distributes the rows between the regions.
> >
> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
> >
> >> Hash? Which hash are you referring to sir?
> >> St.Ack
> >>
> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <li...@infolinks.com>
> >> wrote:
> >>> Hi,
> >>> What is the API or configuration for changing the default hash function
> >> for
> >>> a specific htable.
> >>>
> >>> thanks,
> >>> Lior
> >>>
> >>
>
Re: hash function per table
Posted by Oleg Ruchovets <or...@gmail.com>.
Can you share more information about your tests?
I still have couple of issues that I don't understand :
1) public Scan setTimeRange(long minStamp, long maxStamp) vs
startKey , endKey approach , what is the better approach and does one has
significant time execution difference compare to another.
2) Suppose I am inserting data I try to distribute it across the
regions and I will create index at the same time. Will Index help me to
improve the scan process?
On Sun, Mar 20, 2011 at 10:03 PM, Pete Haidinyak <ja...@cox.net> wrote:
> I went through this discussion a month or so ago and came away with the
> opinion that you can either have an efficient load with random key but then
> have an inefficient 'scan' not using start and end rows, or have an
> inefficient import with sequential key and then scan using start and end
> rows.
>
> -Pete
>
>
>
> On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <or...@gmail.com>
> wrote:
>
> Actually discussion started from this post:
>>
>>
>>
>> http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
>>
>> Simply inserting the data in which row key <date>_<somedata> I noticed
>> that
>> only one node works (region to which data were writing). In case we have
>> 10-15 nodes I think it is inefficient to write data to only one region. I
>> want to get an effect that data will be inserted to as much as possible
>> nodes simultaneously. Correct me guys , but in this case writing job
>> will take less time , am I write?
>>
>> Oleg.
>>
>> On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <cf...@email.com> wrote:
>>
>> There is none - HBase uses a total order partitioner. The straight key
>>> value itself determines which region a row is put into. This allows for
>>> very
>>> rapid scans of sequential data, among other things but does mean it is
>>> easier to hotspot regions. Key design is very important.
>>>
>>> -chris
>>>
>>> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>>>
>>> > the hash function that distributes the rows between the regions.
>>> >
>>> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
>>> >
>>> >> Hash? Which hash are you referring to sir?
>>> >> St.Ack
>>> >>
>>> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <liors@infolinks.com
>>> >
>>> >> wrote:
>>> >>> Hi,
>>> >>> What is the API or configuration for changing the default hash
>>> function
>>> >> for
>>> >>> a specific htable.
>>> >>>
>>> >>> thanks,
>>> >>> Lior
>>> >>>
>>> >>
>>>
>>>
>
Re: hash function per table
Posted by Andrew Purtell <ap...@apache.org>.
Or use a bulk load process to import sequential data as new stores all in one shot.
- Andy
--- On Sun, 3/20/11, Pete Haidinyak <ja...@cox.net> wrote:
> From: Pete Haidinyak <ja...@cox.net>
> Subject: Re: hash function per table
> To: user@hbase.apache.org
> Date: Sunday, March 20, 2011, 1:03 PM
> I went through this discussion a
> month or so ago and came away with the opinion that you can
> either have an efficient load with random key but then have
> an inefficient 'scan' not using start and end rows, or have
> an inefficient import with sequential key and then scan
> using start and end rows.
>
> -Pete
>
>
> On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <or...@gmail.com>
> wrote:
>
> > Actually discussion started from this post:
> >
> >
> > http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
> >
> > Simply inserting the data in which row key
> <date>_<somedata> I noticed that
> > only one node works (region to which data were
> writing). In case we have
> > 10-15 nodes I think it is inefficient to write data to
> only one region. I
> > want to get an effect that data will be inserted
> to as much as possible
> > nodes simultaneously. Correct me guys ,
> but in this case writing job
> > will take less time , am I write?
> >
> > Oleg.
> >
> > On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <cf...@email.com>
> wrote:
> >
> >> There is none - HBase uses a total order
> partitioner. The straight key
> >> value itself determines which region a row is put
> into. This allows for very
> >> rapid scans of sequential data, among other things
> but does mean it is
> >> easier to hotspot regions. Key design is very
> important.
> >>
> >> -chris
> >>
> >> On Mar 20, 2011, at 11:41 AM, Lior Schachter
> wrote:
> >>
> >> > the hash function that distributes the rows
> between the regions.
> >> >
> >> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net>
> wrote:
> >> >
> >> >> Hash? Which hash are you referring
> to sir?
> >> >> St.Ack
> >> >>
> >> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior
> Schachter <li...@infolinks.com>
> >> >> wrote:
> >> >>> Hi,
> >> >>> What is the API or configuration for
> changing the default hash function
> >> >> for
> >> >>> a specific htable.
> >> >>>
> >> >>> thanks,
> >> >>> Lior
> >> >>>
> >> >>
> >>
>
>
Re: hash function per table
Posted by Lior Schachter <li...@infolinks.com>.
What's the performance penalty when scanning with row prefix filter instead
of with start/end key ?
Can it still work (in reasonable processing time) when the table contains
billions of records ?
On Sun, Mar 20, 2011 at 10:03 PM, Pete Haidinyak <ja...@cox.net> wrote:
> I went through this discussion a month or so ago and came away with the
> opinion that you can either have an efficient load with random key but then
> have an inefficient 'scan' not using start and end rows, or have an
> inefficient import with sequential key and then scan using start and end
> rows.
>
> -Pete
>
>
>
> On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <or...@gmail.com>
> wrote:
>
> Actually discussion started from this post:
>>
>>
>>
>> http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
>>
>> Simply inserting the data in which row key <date>_<somedata> I noticed
>> that
>> only one node works (region to which data were writing). In case we have
>> 10-15 nodes I think it is inefficient to write data to only one region. I
>> want to get an effect that data will be inserted to as much as possible
>> nodes simultaneously. Correct me guys , but in this case writing job
>> will take less time , am I write?
>>
>> Oleg.
>>
>> On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <cf...@email.com> wrote:
>>
>> There is none - HBase uses a total order partitioner. The straight key
>>> value itself determines which region a row is put into. This allows for
>>> very
>>> rapid scans of sequential data, among other things but does mean it is
>>> easier to hotspot regions. Key design is very important.
>>>
>>> -chris
>>>
>>> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>>>
>>> > the hash function that distributes the rows between the regions.
>>> >
>>> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
>>> >
>>> >> Hash? Which hash are you referring to sir?
>>> >> St.Ack
>>> >>
>>> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <liors@infolinks.com
>>> >
>>> >> wrote:
>>> >>> Hi,
>>> >>> What is the API or configuration for changing the default hash
>>> function
>>> >> for
>>> >>> a specific htable.
>>> >>>
>>> >>> thanks,
>>> >>> Lior
>>> >>>
>>> >>
>>>
>>>
>
Re: hash function per table
Posted by Pete Haidinyak <ja...@cox.net>.
I went through this discussion a month or so ago and came away with the
opinion that you can either have an efficient load with random key but
then have an inefficient 'scan' not using start and end rows, or have an
inefficient import with sequential key and then scan using start and end
rows.
-Pete
On Sun, 20 Mar 2011 12:52:24 -0700, Oleg Ruchovets <or...@gmail.com>
wrote:
> Actually discussion started from this post:
>
>
> http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
>
> Simply inserting the data in which row key <date>_<somedata> I noticed
> that
> only one node works (region to which data were writing). In case we have
> 10-15 nodes I think it is inefficient to write data to only one region. I
> want to get an effect that data will be inserted to as much as possible
> nodes simultaneously. Correct me guys , but in this case writing job
> will take less time , am I write?
>
> Oleg.
>
> On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <cf...@email.com> wrote:
>
>> There is none - HBase uses a total order partitioner. The straight key
>> value itself determines which region a row is put into. This allows for
>> very
>> rapid scans of sequential data, among other things but does mean it is
>> easier to hotspot regions. Key design is very important.
>>
>> -chris
>>
>> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>>
>> > the hash function that distributes the rows between the regions.
>> >
>> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
>> >
>> >> Hash? Which hash are you referring to sir?
>> >> St.Ack
>> >>
>> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter
>> <li...@infolinks.com>
>> >> wrote:
>> >>> Hi,
>> >>> What is the API or configuration for changing the default hash
>> function
>> >> for
>> >>> a specific htable.
>> >>>
>> >>> thanks,
>> >>> Lior
>> >>>
>> >>
>>
Re: hash function per table
Posted by Oleg Ruchovets <or...@gmail.com>.
Actually discussion started from this post:
http://search-hadoop.com/m/XX3nW68JsY1/hbase+insertion+optimisation&subj=hbase+insertion+optimisation+
Simply inserting the data in which row key <date>_<somedata> I noticed that
only one node works (region to which data were writing). In case we have
10-15 nodes I think it is inefficient to write data to only one region. I
want to get an effect that data will be inserted to as much as possible
nodes simultaneously. Correct me guys , but in this case writing job
will take less time , am I write?
Oleg.
On Sun, Mar 20, 2011 at 8:57 PM, Chris Tarnas <cf...@email.com> wrote:
> There is none - HBase uses a total order partitioner. The straight key
> value itself determines which region a row is put into. This allows for very
> rapid scans of sequential data, among other things but does mean it is
> easier to hotspot regions. Key design is very important.
>
> -chris
>
> On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
>
> > the hash function that distributes the rows between the regions.
> >
> > On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
> >
> >> Hash? Which hash are you referring to sir?
> >> St.Ack
> >>
> >> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <li...@infolinks.com>
> >> wrote:
> >>> Hi,
> >>> What is the API or configuration for changing the default hash function
> >> for
> >>> a specific htable.
> >>>
> >>> thanks,
> >>> Lior
> >>>
> >>
>
>
Re: hash function per table
Posted by Chris Tarnas <cf...@email.com>.
There is none - HBase uses a total order partitioner. The straight key value itself determines which region a row is put into. This allows for very rapid scans of sequential data, among other things but does mean it is easier to hotspot regions. Key design is very important.
-chris
On Mar 20, 2011, at 11:41 AM, Lior Schachter wrote:
> the hash function that distributes the rows between the regions.
>
> On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
>
>> Hash? Which hash are you referring to sir?
>> St.Ack
>>
>> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <li...@infolinks.com>
>> wrote:
>>> Hi,
>>> What is the API or configuration for changing the default hash function
>> for
>>> a specific htable.
>>>
>>> thanks,
>>> Lior
>>>
>>
Re: hash function per table
Posted by Lior Schachter <li...@infolinks.com>.
the hash function that distributes the rows between the regions.
On Sun, Mar 20, 2011 at 8:36 PM, Stack <st...@duboce.net> wrote:
> Hash? Which hash are you referring to sir?
> St.Ack
>
> On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <li...@infolinks.com>
> wrote:
> > Hi,
> > What is the API or configuration for changing the default hash function
> for
> > a specific htable.
> >
> > thanks,
> > Lior
> >
>
Re: hash function per table
Posted by Stack <st...@duboce.net>.
Hash? Which hash are you referring to sir?
St.Ack
On Sun, Mar 20, 2011 at 10:06 AM, Lior Schachter <li...@infolinks.com> wrote:
> Hi,
> What is the API or configuration for changing the default hash function for
> a specific htable.
>
> thanks,
> Lior
>