You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Alex Baranau <al...@gmail.com> on 2011/04/21 15:56:59 UTC

Re: Hash keys

For those who are looking for the solution to this or similar issue, this
can be useful:

Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
implements solution close to what Lars described.
Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
<er...@u-mangate.com>wrote:

> Hi Lars,
> Many tks for your reply.
>
> For now, I just rely on random or hashed keys and don't need any range
> queries.
> I will have to choose a nice solution one day for ordered keys upon which I
> will range-query.
>
> I will post the results of the different data models I will try (looking
> for other threads/articles to have a better view of the options).
>
> Tks,
> - Eric
>
>
> On 16/03/2011 23:02, Lars George wrote:
>
>> Hi Eric,
>>
>> Oops, you are right, my example was not clear and actually confusing
>> the keys with sequential ones. The hash should map every Nth row key
>> to the same bucket, so that you would for example see an interleaved
>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>> big issue. And yes, this is currently all client side driven :(
>>
>> Lars
>>
>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>> <er...@u-mangate.com>  wrote:
>>
>>> Hi Lars,
>>> Many tks for your explanations!
>>>
>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>> different cases (just rephrasing what you said to be sure I get it):
>>>
>>> - Keys are really random (GUID or whatever): you have the distribution
>>> for
>>> free, still can't do, and probably don't need, range-queries.
>>>
>>> - If keys are monotonically increasing (timestamp, autoincremented,...),
>>> there are two cases:
>>> 1) sometimes, you don't need to do some range-queries and can store the
>>> key
>>> as a real hash (md5,...) to have distribution.
>>> 2) For timebased series for example, you may need to do some range
>>> queries,
>>> and adding a salt can be an answer to combine best-of-world.
>>>
>>> I understand the "salt" approach as recreating on the client side
>>> "artifical" key spaces.
>>>
>>> I was first confused reading "row 1...1000 ->  prefix h1_".
>>> To really make the distribution random, I would have seen prefix/salt
>>> attributed randomly for a key leading to for example a h1 keyspace as
>>> such:
>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>
>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>> going
>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>> In that case, if you look for keys in "hour 1", you would only need one
>>> scanner cause you know that they reside in "h1_", and you could query
>>> with
>>> scan(h1_time1, h1_time2).
>>>
>>> But at at time, as you describe, you may need to scan different buckets
>>> with
>>> different scanners and use an ordered list to contain the result.
>>> - What about performance in that case? for very large dataset, a range
>>> query
>>> will take much time. I can imagine async client at the rescue. Maybe also
>>> mapreduce jobs could help cause if will benefit from data locality.
>>> - Also, the client application must manage the salts: it's a bit like
>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>> client carry on this layer. The client will have to store (in hbase :))
>>> the
>>> mapping between key ranges and their salt prefixes. It's a bit like
>>> exporting some core? functionality to the client.
>>>
>>> Strange, I fell I missed your point :)
>>> Tks,
>>>
>>> - Eric
>>>
>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>> (should
>>> get used to, cause I just learned another scripting language running on
>>> jvm
>>> for another project...)
>>>
>>>
>>> On 16/03/2011 13:00, Lars George wrote:
>>>
>>>> Hi Eric,
>>>>
>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>> it from the command line or so.
>>>>
>>>> And yes, you can range scan using a prefix. We were discussing this
>>>> recently and there is this notion of design for reads, or design for
>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>> tough to find common grounds as both designs are on the far end of the
>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>> approach, which gives you distribution but still being able to scan...
>>>> but not without some client side support. One typical class of data is
>>>> timeseries based keys. As for scanning them, you need N client side
>>>> scanners. Imagine this example:
>>>>
>>>> row       1 ... 1000 ->    Prefix "h1_"
>>>> row 1001 ... 2000 ->    Prefix "h2_"
>>>> row 2001 ... 3000 ->    Prefix "h3_"
>>>> row 3001 ... 4000 ->    Prefix "h4_"
>>>> row 4001 ... 5000 ->    Prefix "h5_"
>>>> row 5001 ... 6000 ->    Prefix "h6_"
>>>> row 6001 ... 7000 ->    Prefix "h7_"
>>>>
>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>> (also sometimes called salt) are used to distribute them row keys to
>>>> region servers. To scan the entire range as one large key space you
>>>> need to create 7 scanners:
>>>>
>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>> 7. scanner: start row: "h7_", end row ""
>>>>
>>>> Now each of them gives you the first row that matches the start and
>>>> end row keys they are configure for. So you then take that first KV
>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>> hold something like:
>>>>
>>>> 1. row "1" ->    kv from scanner 1
>>>> 2. row "1010" ->    kv from scanner 2
>>>> 3. row "2001" ->    kv from scanner 3
>>>> 4. row "3033" ->    kv from scanner 4
>>>> 5. row "4001" ->    kv from scanner 5
>>>> 6. row "5002" ->    kv from scanner 6
>>>> 7. row "6000" ->    kv from scanner 7
>>>>
>>>> (assuming that the keys are not contiguous but have gaps)
>>>>
>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>> offering. Then insert that into the list and you get
>>>>
>>>> 1. row "3" ->    kv from scanner 1
>>>> 2. row "1010" ->    kv from scanner 2
>>>> 3. row "2001" ->    kv from scanner 3
>>>> 4. row "3033" ->    kv from scanner 4
>>>> 5. row "4001" ->    kv from scanner 5
>>>> 6. row "5002" ->    kv from scanner 6
>>>> 7. row "6000" ->    kv from scanner 7
>>>>
>>>> Notice how you always only have a list with N elements on the client
>>>> side, each representing the next value the scanners offer. Since the
>>>> list is sorted you always access item #1 and therefore the next in the
>>>> entire key space.
>>>>
>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>> then give you values from scanner 2 as the first elements in it. And
>>>> so on.
>>>>
>>>> Makes more sense?
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>> <er...@u-mangate.com>    wrote:
>>>>
>>>>> Hi Lars,
>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>> I can find python scripts, but no jruby one...
>>>>>
>>>>> Aside the hash function I could reuse, are you saying that range
>>>>> queries
>>>>> are
>>>>> possible even with hashed keys (randomly distributed)?
>>>>> (If possible with the script, it will also be possible from the hbase
>>>>> java
>>>>> client).
>>>>> Even with your explanation, I can't figure out how compound keys
>>>>> (hasedkey+key) can be range-queried.
>>>>>
>>>>> Tks,
>>>>> - Eric
>>>>>
>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>> leading hashes to distribute them across servers. When you want to do
>>>>>> scans you need to create N scans, where N is the number of hashes and
>>>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>> first. You can then access the rows in sorted order where the first
>>>>>> element in the list is the one with the first key to read. Once you
>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>
>>>>>> The shell is written in JRuby, so any function you can use there would
>>>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>> you could write that code in Ruby and add it to you local shell giving
>>>>>> you what you need.
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>> <er...@u-mangate.com>      wrote:
>>>>>>
>>>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>>>> they
>>>>>>> can not be queried based on a range...)
>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>> advice
>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>> Tks,
>>>>>>> Eric
>>>>>>>
>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>> tables.
>>>>>>>>
>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>> key)
>>>>>>>> use case, because the rows will be randomly distributed in different
>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>>>> 't1',
>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>> readable
>>>>>>>> key.
>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>> would
>>>>>>>> you
>>>>>>>> advice? what about SHA1?
>>>>>>>>
>>>>>>>> Tks,
>>>>>>>> - Eric
>>>>>>>>
>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>
>>>>>>>>
>>>
>

Re: Hash keys

Posted by Alex Baranau <al...@gmail.com>.
I needed to be sure it works for pre-0.90 version *too*, because this is
what one of our clusters use (as I believe most clusters which are in use by
others).
It works for newer versions too. There's a most recent jar available for
download in project sources root which one can just download and use with
his/her own cluster.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Apr 21, 2011 at 6:04 PM, Eric Charles <er...@u-mangate.com>wrote:

> Hi Alex,
>
> Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase"
> thread. Tks for this :) - I will need some more time to test it (quite busy
> atm).
>
> Why did you declare 0.89.20100924. and not 0.91 (or 92) as HBaseWD
> dependency?
>
> Tks,
> - Eric
>
>
>
> On 21/04/2011 15:56, Alex Baranau wrote:
>
>> For those who are looking for the solution to this or similar issue, this
>> can be useful:
>>
>> Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
>> implements solution close to what Lars described.
>> Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO
>>
>> Alex Baranau
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
>> <er...@u-mangate.com>wrote:
>>
>>  Hi Lars,
>>> Many tks for your reply.
>>>
>>> For now, I just rely on random or hashed keys and don't need any range
>>> queries.
>>> I will have to choose a nice solution one day for ordered keys upon which
>>> I
>>> will range-query.
>>>
>>> I will post the results of the different data models I will try (looking
>>> for other threads/articles to have a better view of the options).
>>>
>>> Tks,
>>> - Eric
>>>
>>>
>>> On 16/03/2011 23:02, Lars George wrote:
>>>
>>>  Hi Eric,
>>>>
>>>> Oops, you are right, my example was not clear and actually confusing
>>>> the keys with sequential ones. The hash should map every Nth row key
>>>> to the same bucket, so that you would for example see an interleaved
>>>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>>>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>>>> big issue. And yes, this is currently all client side driven :(
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>>>> <er...@u-mangate.com>   wrote:
>>>>
>>>>  Hi Lars,
>>>>> Many tks for your explanations!
>>>>>
>>>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>>>> different cases (just rephrasing what you said to be sure I get it):
>>>>>
>>>>> - Keys are really random (GUID or whatever): you have the distribution
>>>>> for
>>>>> free, still can't do, and probably don't need, range-queries.
>>>>>
>>>>> - If keys are monotonically increasing (timestamp,
>>>>> autoincremented,...),
>>>>> there are two cases:
>>>>> 1) sometimes, you don't need to do some range-queries and can store the
>>>>> key
>>>>> as a real hash (md5,...) to have distribution.
>>>>> 2) For timebased series for example, you may need to do some range
>>>>> queries,
>>>>> and adding a salt can be an answer to combine best-of-world.
>>>>>
>>>>> I understand the "salt" approach as recreating on the client side
>>>>> "artifical" key spaces.
>>>>>
>>>>> I was first confused reading "row 1...1000 ->   prefix h1_".
>>>>> To really make the distribution random, I would have seen prefix/salt
>>>>> attributed randomly for a key leading to for example a h1 keyspace as
>>>>> such:
>>>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>>>
>>>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>>>> going
>>>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>>>> In that case, if you look for keys in "hour 1", you would only need one
>>>>> scanner cause you know that they reside in "h1_", and you could query
>>>>> with
>>>>> scan(h1_time1, h1_time2).
>>>>>
>>>>> But at at time, as you describe, you may need to scan different buckets
>>>>> with
>>>>> different scanners and use an ordered list to contain the result.
>>>>> - What about performance in that case? for very large dataset, a range
>>>>> query
>>>>> will take much time. I can imagine async client at the rescue. Maybe
>>>>> also
>>>>> mapreduce jobs could help cause if will benefit from data locality.
>>>>> - Also, the client application must manage the salts: it's a bit like
>>>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>>>> client carry on this layer. The client will have to store (in hbase :))
>>>>> the
>>>>> mapping between key ranges and their salt prefixes. It's a bit like
>>>>> exporting some core? functionality to the client.
>>>>>
>>>>> Strange, I fell I missed your point :)
>>>>> Tks,
>>>>>
>>>>> - Eric
>>>>>
>>>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>>>> (should
>>>>> get used to, cause I just learned another scripting language running on
>>>>> jvm
>>>>> for another project...)
>>>>>
>>>>>
>>>>> On 16/03/2011 13:00, Lars George wrote:
>>>>>
>>>>>  Hi Eric,
>>>>>>
>>>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>>>> it from the command line or so.
>>>>>>
>>>>>> And yes, you can range scan using a prefix. We were discussing this
>>>>>> recently and there is this notion of design for reads, or design for
>>>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>>>> tough to find common grounds as both designs are on the far end of the
>>>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>>>> approach, which gives you distribution but still being able to scan...
>>>>>> but not without some client side support. One typical class of data is
>>>>>> timeseries based keys. As for scanning them, you need N client side
>>>>>> scanners. Imagine this example:
>>>>>>
>>>>>> row       1 ... 1000 ->     Prefix "h1_"
>>>>>> row 1001 ... 2000 ->     Prefix "h2_"
>>>>>> row 2001 ... 3000 ->     Prefix "h3_"
>>>>>> row 3001 ... 4000 ->     Prefix "h4_"
>>>>>> row 4001 ... 5000 ->     Prefix "h5_"
>>>>>> row 5001 ... 6000 ->     Prefix "h6_"
>>>>>> row 6001 ... 7000 ->     Prefix "h7_"
>>>>>>
>>>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>>>> (also sometimes called salt) are used to distribute them row keys to
>>>>>> region servers. To scan the entire range as one large key space you
>>>>>> need to create 7 scanners:
>>>>>>
>>>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>>>> 7. scanner: start row: "h7_", end row ""
>>>>>>
>>>>>> Now each of them gives you the first row that matches the start and
>>>>>> end row keys they are configure for. So you then take that first KV
>>>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>>>> hold something like:
>>>>>>
>>>>>> 1. row "1" ->     kv from scanner 1
>>>>>> 2. row "1010" ->     kv from scanner 2
>>>>>> 3. row "2001" ->     kv from scanner 3
>>>>>> 4. row "3033" ->     kv from scanner 4
>>>>>> 5. row "4001" ->     kv from scanner 5
>>>>>> 6. row "5002" ->     kv from scanner 6
>>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>>
>>>>>> (assuming that the keys are not contiguous but have gaps)
>>>>>>
>>>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>>>> offering. Then insert that into the list and you get
>>>>>>
>>>>>> 1. row "3" ->     kv from scanner 1
>>>>>> 2. row "1010" ->     kv from scanner 2
>>>>>> 3. row "2001" ->     kv from scanner 3
>>>>>> 4. row "3033" ->     kv from scanner 4
>>>>>> 5. row "4001" ->     kv from scanner 5
>>>>>> 6. row "5002" ->     kv from scanner 6
>>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>>
>>>>>> Notice how you always only have a list with N elements on the client
>>>>>> side, each representing the next value the scanners offer. Since the
>>>>>> list is sorted you always access item #1 and therefore the next in the
>>>>>> entire key space.
>>>>>>
>>>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>>>> then give you values from scanner 2 as the first elements in it. And
>>>>>> so on.
>>>>>>
>>>>>> Makes more sense?
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>>>> <er...@u-mangate.com>     wrote:
>>>>>>
>>>>>>  Hi Lars,
>>>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>>>> I can find python scripts, but no jruby one...
>>>>>>>
>>>>>>> Aside the hash function I could reuse, are you saying that range
>>>>>>> queries
>>>>>>> are
>>>>>>> possible even with hashed keys (randomly distributed)?
>>>>>>> (If possible with the script, it will also be possible from the hbase
>>>>>>> java
>>>>>>> client).
>>>>>>> Even with your explanation, I can't figure out how compound keys
>>>>>>> (hasedkey+key) can be range-queried.
>>>>>>>
>>>>>>> Tks,
>>>>>>> - Eric
>>>>>>>
>>>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>>>
>>>>>>>  Hi Eric,
>>>>>>>>
>>>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>>>> leading hashes to distribute them across servers. When you want to
>>>>>>>> do
>>>>>>>> scans you need to create N scans, where N is the number of hashes
>>>>>>>> and
>>>>>>>> then do a next() on each scanner, putting all KVs into one sorted
>>>>>>>> list
>>>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>>>> first. You can then access the rows in sorted order where the first
>>>>>>>> element in the list is the one with the first key to read. Once you
>>>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>>>
>>>>>>>> The shell is written in JRuby, so any function you can use there
>>>>>>>> would
>>>>>>>> make sense to use in the prefix, then you could compute it on the
>>>>>>>> fly.
>>>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>>>> you could write that code in Ruby and add it to you local shell
>>>>>>>> giving
>>>>>>>> you what you need.
>>>>>>>>
>>>>>>>> Lars
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>>>> <er...@u-mangate.com>       wrote:
>>>>>>>>
>>>>>>>>  Oops, forget my first question about range query (if keys are
>>>>>>>>> hashed,
>>>>>>>>> they
>>>>>>>>> can not be queried based on a range...)
>>>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>>>> advice
>>>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>>>> Tks,
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>>>
>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>>>> tables.
>>>>>>>>>>
>>>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>>>> key)
>>>>>>>>>> use case, because the rows will be randomly distributed in
>>>>>>>>>> different
>>>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>>>> 2. Is it possible to query from hbase shell with something like
>>>>>>>>>> "get
>>>>>>>>>> 't1',
>>>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>>>> readable
>>>>>>>>>> key.
>>>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>>>> would
>>>>>>>>>> you
>>>>>>>>>> advice? what about SHA1?
>>>>>>>>>>
>>>>>>>>>> Tks,
>>>>>>>>>> - Eric
>>>>>>>>>>
>>>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>
>>

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Hi Alex,

Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase" 
thread. Tks for this :) - I will need some more time to test it (quite 
busy atm).

Why did you declare 0.89.20100924. and not 0.91 (or 92) as HBaseWD 
dependency?

Tks,
- Eric


On 21/04/2011 15:56, Alex Baranau wrote:
> For those who are looking for the solution to this or similar issue, this
> can be useful:
>
> Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
> implements solution close to what Lars described.
> Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
> <er...@u-mangate.com>wrote:
>
>> Hi Lars,
>> Many tks for your reply.
>>
>> For now, I just rely on random or hashed keys and don't need any range
>> queries.
>> I will have to choose a nice solution one day for ordered keys upon which I
>> will range-query.
>>
>> I will post the results of the different data models I will try (looking
>> for other threads/articles to have a better view of the options).
>>
>> Tks,
>> - Eric
>>
>>
>> On 16/03/2011 23:02, Lars George wrote:
>>
>>> Hi Eric,
>>>
>>> Oops, you are right, my example was not clear and actually confusing
>>> the keys with sequential ones. The hash should map every Nth row key
>>> to the same bucket, so that you would for example see an interleaved
>>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>>> big issue. And yes, this is currently all client side driven :(
>>>
>>> Lars
>>>
>>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>>> <er...@u-mangate.com>   wrote:
>>>
>>>> Hi Lars,
>>>> Many tks for your explanations!
>>>>
>>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>>> different cases (just rephrasing what you said to be sure I get it):
>>>>
>>>> - Keys are really random (GUID or whatever): you have the distribution
>>>> for
>>>> free, still can't do, and probably don't need, range-queries.
>>>>
>>>> - If keys are monotonically increasing (timestamp, autoincremented,...),
>>>> there are two cases:
>>>> 1) sometimes, you don't need to do some range-queries and can store the
>>>> key
>>>> as a real hash (md5,...) to have distribution.
>>>> 2) For timebased series for example, you may need to do some range
>>>> queries,
>>>> and adding a salt can be an answer to combine best-of-world.
>>>>
>>>> I understand the "salt" approach as recreating on the client side
>>>> "artifical" key spaces.
>>>>
>>>> I was first confused reading "row 1...1000 ->   prefix h1_".
>>>> To really make the distribution random, I would have seen prefix/salt
>>>> attributed randomly for a key leading to for example a h1 keyspace as
>>>> such:
>>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>>
>>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>>> going
>>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>>> In that case, if you look for keys in "hour 1", you would only need one
>>>> scanner cause you know that they reside in "h1_", and you could query
>>>> with
>>>> scan(h1_time1, h1_time2).
>>>>
>>>> But at at time, as you describe, you may need to scan different buckets
>>>> with
>>>> different scanners and use an ordered list to contain the result.
>>>> - What about performance in that case? for very large dataset, a range
>>>> query
>>>> will take much time. I can imagine async client at the rescue. Maybe also
>>>> mapreduce jobs could help cause if will benefit from data locality.
>>>> - Also, the client application must manage the salts: it's a bit like
>>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>>> client carry on this layer. The client will have to store (in hbase :))
>>>> the
>>>> mapping between key ranges and their salt prefixes. It's a bit like
>>>> exporting some core? functionality to the client.
>>>>
>>>> Strange, I fell I missed your point :)
>>>> Tks,
>>>>
>>>> - Eric
>>>>
>>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>>> (should
>>>> get used to, cause I just learned another scripting language running on
>>>> jvm
>>>> for another project...)
>>>>
>>>>
>>>> On 16/03/2011 13:00, Lars George wrote:
>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>>> it from the command line or so.
>>>>>
>>>>> And yes, you can range scan using a prefix. We were discussing this
>>>>> recently and there is this notion of design for reads, or design for
>>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>>> tough to find common grounds as both designs are on the far end of the
>>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>>> approach, which gives you distribution but still being able to scan...
>>>>> but not without some client side support. One typical class of data is
>>>>> timeseries based keys. As for scanning them, you need N client side
>>>>> scanners. Imagine this example:
>>>>>
>>>>> row       1 ... 1000 ->     Prefix "h1_"
>>>>> row 1001 ... 2000 ->     Prefix "h2_"
>>>>> row 2001 ... 3000 ->     Prefix "h3_"
>>>>> row 3001 ... 4000 ->     Prefix "h4_"
>>>>> row 4001 ... 5000 ->     Prefix "h5_"
>>>>> row 5001 ... 6000 ->     Prefix "h6_"
>>>>> row 6001 ... 7000 ->     Prefix "h7_"
>>>>>
>>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>>> (also sometimes called salt) are used to distribute them row keys to
>>>>> region servers. To scan the entire range as one large key space you
>>>>> need to create 7 scanners:
>>>>>
>>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>>> 7. scanner: start row: "h7_", end row ""
>>>>>
>>>>> Now each of them gives you the first row that matches the start and
>>>>> end row keys they are configure for. So you then take that first KV
>>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>>> hold something like:
>>>>>
>>>>> 1. row "1" ->     kv from scanner 1
>>>>> 2. row "1010" ->     kv from scanner 2
>>>>> 3. row "2001" ->     kv from scanner 3
>>>>> 4. row "3033" ->     kv from scanner 4
>>>>> 5. row "4001" ->     kv from scanner 5
>>>>> 6. row "5002" ->     kv from scanner 6
>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>
>>>>> (assuming that the keys are not contiguous but have gaps)
>>>>>
>>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>>> offering. Then insert that into the list and you get
>>>>>
>>>>> 1. row "3" ->     kv from scanner 1
>>>>> 2. row "1010" ->     kv from scanner 2
>>>>> 3. row "2001" ->     kv from scanner 3
>>>>> 4. row "3033" ->     kv from scanner 4
>>>>> 5. row "4001" ->     kv from scanner 5
>>>>> 6. row "5002" ->     kv from scanner 6
>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>
>>>>> Notice how you always only have a list with N elements on the client
>>>>> side, each representing the next value the scanners offer. Since the
>>>>> list is sorted you always access item #1 and therefore the next in the
>>>>> entire key space.
>>>>>
>>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>>> then give you values from scanner 2 as the first elements in it. And
>>>>> so on.
>>>>>
>>>>> Makes more sense?
>>>>>
>>>>> Lars
>>>>>
>>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>>> <er...@u-mangate.com>     wrote:
>>>>>
>>>>>> Hi Lars,
>>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>>> I can find python scripts, but no jruby one...
>>>>>>
>>>>>> Aside the hash function I could reuse, are you saying that range
>>>>>> queries
>>>>>> are
>>>>>> possible even with hashed keys (randomly distributed)?
>>>>>> (If possible with the script, it will also be possible from the hbase
>>>>>> java
>>>>>> client).
>>>>>> Even with your explanation, I can't figure out how compound keys
>>>>>> (hasedkey+key) can be range-queried.
>>>>>>
>>>>>> Tks,
>>>>>> - Eric
>>>>>>
>>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>>> leading hashes to distribute them across servers. When you want to do
>>>>>>> scans you need to create N scans, where N is the number of hashes and
>>>>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>>> first. You can then access the rows in sorted order where the first
>>>>>>> element in the list is the one with the first key to read. Once you
>>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>>
>>>>>>> The shell is written in JRuby, so any function you can use there would
>>>>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>>> you could write that code in Ruby and add it to you local shell giving
>>>>>>> you what you need.
>>>>>>>
>>>>>>> Lars
>>>>>>>
>>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>>> <er...@u-mangate.com>       wrote:
>>>>>>>
>>>>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>>>>> they
>>>>>>>> can not be queried based on a range...)
>>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>>> advice
>>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>>> Tks,
>>>>>>>> Eric
>>>>>>>>
>>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>>> tables.
>>>>>>>>>
>>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>>> key)
>>>>>>>>> use case, because the rows will be randomly distributed in different
>>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>>>>> 't1',
>>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>>> readable
>>>>>>>>> key.
>>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>>> would
>>>>>>>>> you
>>>>>>>>> advice? what about SHA1?
>>>>>>>>>
>>>>>>>>> Tks,
>>>>>>>>> - Eric
>>>>>>>>>
>>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>>
>>>>>>>>>
>>>>
>>
>