You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Eric Charles <er...@u-mangate.com> on 2011/03/16 09:52:51 UTC

Hash keys

Hi,

To help avoid hotspots, I'm planning to use hashed keys in some tables.

1. I wonder if this strategy is adviced for range queries (from/to key) 
use case, because the rows will be randomly distributed in different 
regions. Will it cause some performance loose?
2. Is it possible to query from hbase shell with something like "get 
't1', @hash('r1')", to let the shell compute the hash for you from the 
readable key.
3. There are MD5 and Jenkins classes in hbase.util package. What would 
you advice? what about SHA1?

Tks,
- Eric

PS: I searched the archive but didn't find the answers.


Re: problem bringing Hbase back up after power outage and removal of nodes

Posted by Ryan Rawson <ry...@gmail.com>.
If you are in safe mode it's because not all datanodes have reported
in.  So actually NO your hadoop did NOT come up properly.

Check your nn pages, look for any missing nodes.  It won't help you
any more than telling you what is online or not.

Good luck!
-ryan

On Thu, Mar 17, 2011 at 11:12 AM, Vishal Kapoor
<vi...@gmail.com> wrote:
> you should have more info on why dfs is in the safe mode in the logs,
> you can always leave safe mode
>
> hadoop dfs -safemode leave
>
> but again, thats a symptom, not a problem.
>
> Vishal
>
> On Thu, Mar 17, 2011 at 1:55 PM, Taylor, Ronald C <ro...@pnl.gov>wrote:
>
>> Folks,
>>
>> We had a power outage here, and we are trying to bring our Hadoop/HBase
>> cluster back up. Hadoop has been just fine - came up smoothly. HBase has
>> not. Our HBase master log file is filled with just one msg:
>>
>> 2011-03-17 10:50:08,712 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting
>> for dfs to exit safe mode...
>> 2011-03-17 10:50:18,714 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting
>> for dfs to exit safe mode...
>>
>> After the power outage, before we tried bringing things back up, we took
>> several nodes off-line, as their hard drives were failing and needed
>> replacing. Don't know if the loss of those nodes has anything to do with the
>> error msg that we are seeing.
>>
>> Could anybody give us some advice as to where to look for the cause of the
>> HBase failure? We would very much appreciate guidance.
>>
>> Ron
>>
>> Ronald Taylor, Ph.D.
>> Computatational Biology & Bioinformatics Group
>> Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
>> Richland, WA 99352
>> phone: (509) 372-6568
>> email: ronald.taylor@pnl.gov
>>
>>
>>
>

RE: problem bringing Hbase back up after power outage and removal of nodes

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.
Ryan, Vishal,

Yep, right after I sent the email we figured out that the problem was on the Hadoop side. We are tracking it down; thanks for the very quick responses.
Ron

Ronald Taylor, Ph.D.
Computatational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568
email: ronald.taylor@pnl.gov


-----Original Message-----
From: Vishal Kapoor [mailto:vishal.kapoor.in@gmail.com] 
Sent: Thursday, March 17, 2011 11:12 AM
To: user@hbase.apache.org
Subject: Re: problem bringing Hbase back up after power outage and removal of nodes

you should have more info on why dfs is in the safe mode in the logs,
you can always leave safe mode

hadoop dfs -safemode leave

but again, thats a symptom, not a problem.

Vishal

On Thu, Mar 17, 2011 at 1:55 PM, Taylor, Ronald C <ro...@pnl.gov>wrote:

> Folks,
>
> We had a power outage here, and we are trying to bring our Hadoop/HBase
> cluster back up. Hadoop has been just fine - came up smoothly. HBase has
> not. Our HBase master log file is filled with just one msg:
>
> 2011-03-17 10:50:08,712 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting
> for dfs to exit safe mode...
> 2011-03-17 10:50:18,714 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting
> for dfs to exit safe mode...
>
> After the power outage, before we tried bringing things back up, we took
> several nodes off-line, as their hard drives were failing and needed
> replacing. Don't know if the loss of those nodes has anything to do with the
> error msg that we are seeing.
>
> Could anybody give us some advice as to where to look for the cause of the
> HBase failure? We would very much appreciate guidance.
>
> Ron
>
> Ronald Taylor, Ph.D.
> Computatational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
> Richland, WA 99352
> phone: (509) 372-6568
> email: ronald.taylor@pnl.gov
>
>
>

Re: problem bringing Hbase back up after power outage and removal of nodes

Posted by Vishal Kapoor <vi...@gmail.com>.
you should have more info on why dfs is in the safe mode in the logs,
you can always leave safe mode

hadoop dfs -safemode leave

but again, thats a symptom, not a problem.

Vishal

On Thu, Mar 17, 2011 at 1:55 PM, Taylor, Ronald C <ro...@pnl.gov>wrote:

> Folks,
>
> We had a power outage here, and we are trying to bring our Hadoop/HBase
> cluster back up. Hadoop has been just fine - came up smoothly. HBase has
> not. Our HBase master log file is filled with just one msg:
>
> 2011-03-17 10:50:08,712 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting
> for dfs to exit safe mode...
> 2011-03-17 10:50:18,714 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting
> for dfs to exit safe mode...
>
> After the power outage, before we tried bringing things back up, we took
> several nodes off-line, as their hard drives were failing and needed
> replacing. Don't know if the loss of those nodes has anything to do with the
> error msg that we are seeing.
>
> Could anybody give us some advice as to where to look for the cause of the
> HBase failure? We would very much appreciate guidance.
>
> Ron
>
> Ronald Taylor, Ph.D.
> Computatational Biology & Bioinformatics Group
> Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
> Richland, WA 99352
> phone: (509) 372-6568
> email: ronald.taylor@pnl.gov
>
>
>

problem bringing Hbase back up after power outage and removal of nodes

Posted by "Taylor, Ronald C" <ro...@pnl.gov>.
Folks,

We had a power outage here, and we are trying to bring our Hadoop/HBase cluster back up. Hadoop has been just fine - came up smoothly. HBase has not. Our HBase master log file is filled with just one msg:

2011-03-17 10:50:08,712 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...
2011-03-17 10:50:18,714 INFO org.apache.hadoop.hbase.util.FSUtils: Waiting for dfs to exit safe mode...

After the power outage, before we tried bringing things back up, we took several nodes off-line, as their hard drives were failing and needed replacing. Don't know if the loss of those nodes has anything to do with the error msg that we are seeing.

Could anybody give us some advice as to where to look for the cause of the HBase failure? We would very much appreciate guidance.

Ron

Ronald Taylor, Ph.D.
Computatational Biology & Bioinformatics Group
Pacific Northwest National Laboratory (U.S. Dept of Energy/Battelle)
Richland, WA 99352
phone: (509) 372-6568
email: ronald.taylor@pnl.gov



Re: Hash keys

Posted by Alex Baranau <al...@gmail.com>.
I needed to be sure it works for pre-0.90 version *too*, because this is
what one of our clusters use (as I believe most clusters which are in use by
others).
It works for newer versions too. There's a most recent jar available for
download in project sources root which one can just download and use with
his/her own cluster.

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Apr 21, 2011 at 6:04 PM, Eric Charles <er...@u-mangate.com>wrote:

> Hi Alex,
>
> Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase"
> thread. Tks for this :) - I will need some more time to test it (quite busy
> atm).
>
> Why did you declare 0.89.20100924. and not 0.91 (or 92) as HBaseWD
> dependency?
>
> Tks,
> - Eric
>
>
>
> On 21/04/2011 15:56, Alex Baranau wrote:
>
>> For those who are looking for the solution to this or similar issue, this
>> can be useful:
>>
>> Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
>> implements solution close to what Lars described.
>> Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO
>>
>> Alex Baranau
>> ----
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
>> HBase
>>
>> On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
>> <er...@u-mangate.com>wrote:
>>
>>  Hi Lars,
>>> Many tks for your reply.
>>>
>>> For now, I just rely on random or hashed keys and don't need any range
>>> queries.
>>> I will have to choose a nice solution one day for ordered keys upon which
>>> I
>>> will range-query.
>>>
>>> I will post the results of the different data models I will try (looking
>>> for other threads/articles to have a better view of the options).
>>>
>>> Tks,
>>> - Eric
>>>
>>>
>>> On 16/03/2011 23:02, Lars George wrote:
>>>
>>>  Hi Eric,
>>>>
>>>> Oops, you are right, my example was not clear and actually confusing
>>>> the keys with sequential ones. The hash should map every Nth row key
>>>> to the same bucket, so that you would for example see an interleaved
>>>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>>>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>>>> big issue. And yes, this is currently all client side driven :(
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>>>> <er...@u-mangate.com>   wrote:
>>>>
>>>>  Hi Lars,
>>>>> Many tks for your explanations!
>>>>>
>>>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>>>> different cases (just rephrasing what you said to be sure I get it):
>>>>>
>>>>> - Keys are really random (GUID or whatever): you have the distribution
>>>>> for
>>>>> free, still can't do, and probably don't need, range-queries.
>>>>>
>>>>> - If keys are monotonically increasing (timestamp,
>>>>> autoincremented,...),
>>>>> there are two cases:
>>>>> 1) sometimes, you don't need to do some range-queries and can store the
>>>>> key
>>>>> as a real hash (md5,...) to have distribution.
>>>>> 2) For timebased series for example, you may need to do some range
>>>>> queries,
>>>>> and adding a salt can be an answer to combine best-of-world.
>>>>>
>>>>> I understand the "salt" approach as recreating on the client side
>>>>> "artifical" key spaces.
>>>>>
>>>>> I was first confused reading "row 1...1000 ->   prefix h1_".
>>>>> To really make the distribution random, I would have seen prefix/salt
>>>>> attributed randomly for a key leading to for example a h1 keyspace as
>>>>> such:
>>>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>>>
>>>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>>>> going
>>>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>>>> In that case, if you look for keys in "hour 1", you would only need one
>>>>> scanner cause you know that they reside in "h1_", and you could query
>>>>> with
>>>>> scan(h1_time1, h1_time2).
>>>>>
>>>>> But at at time, as you describe, you may need to scan different buckets
>>>>> with
>>>>> different scanners and use an ordered list to contain the result.
>>>>> - What about performance in that case? for very large dataset, a range
>>>>> query
>>>>> will take much time. I can imagine async client at the rescue. Maybe
>>>>> also
>>>>> mapreduce jobs could help cause if will benefit from data locality.
>>>>> - Also, the client application must manage the salts: it's a bit like
>>>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>>>> client carry on this layer. The client will have to store (in hbase :))
>>>>> the
>>>>> mapping between key ranges and their salt prefixes. It's a bit like
>>>>> exporting some core? functionality to the client.
>>>>>
>>>>> Strange, I fell I missed your point :)
>>>>> Tks,
>>>>>
>>>>> - Eric
>>>>>
>>>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>>>> (should
>>>>> get used to, cause I just learned another scripting language running on
>>>>> jvm
>>>>> for another project...)
>>>>>
>>>>>
>>>>> On 16/03/2011 13:00, Lars George wrote:
>>>>>
>>>>>  Hi Eric,
>>>>>>
>>>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>>>> it from the command line or so.
>>>>>>
>>>>>> And yes, you can range scan using a prefix. We were discussing this
>>>>>> recently and there is this notion of design for reads, or design for
>>>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>>>> tough to find common grounds as both designs are on the far end of the
>>>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>>>> approach, which gives you distribution but still being able to scan...
>>>>>> but not without some client side support. One typical class of data is
>>>>>> timeseries based keys. As for scanning them, you need N client side
>>>>>> scanners. Imagine this example:
>>>>>>
>>>>>> row       1 ... 1000 ->     Prefix "h1_"
>>>>>> row 1001 ... 2000 ->     Prefix "h2_"
>>>>>> row 2001 ... 3000 ->     Prefix "h3_"
>>>>>> row 3001 ... 4000 ->     Prefix "h4_"
>>>>>> row 4001 ... 5000 ->     Prefix "h5_"
>>>>>> row 5001 ... 6000 ->     Prefix "h6_"
>>>>>> row 6001 ... 7000 ->     Prefix "h7_"
>>>>>>
>>>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>>>> (also sometimes called salt) are used to distribute them row keys to
>>>>>> region servers. To scan the entire range as one large key space you
>>>>>> need to create 7 scanners:
>>>>>>
>>>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>>>> 7. scanner: start row: "h7_", end row ""
>>>>>>
>>>>>> Now each of them gives you the first row that matches the start and
>>>>>> end row keys they are configure for. So you then take that first KV
>>>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>>>> hold something like:
>>>>>>
>>>>>> 1. row "1" ->     kv from scanner 1
>>>>>> 2. row "1010" ->     kv from scanner 2
>>>>>> 3. row "2001" ->     kv from scanner 3
>>>>>> 4. row "3033" ->     kv from scanner 4
>>>>>> 5. row "4001" ->     kv from scanner 5
>>>>>> 6. row "5002" ->     kv from scanner 6
>>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>>
>>>>>> (assuming that the keys are not contiguous but have gaps)
>>>>>>
>>>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>>>> offering. Then insert that into the list and you get
>>>>>>
>>>>>> 1. row "3" ->     kv from scanner 1
>>>>>> 2. row "1010" ->     kv from scanner 2
>>>>>> 3. row "2001" ->     kv from scanner 3
>>>>>> 4. row "3033" ->     kv from scanner 4
>>>>>> 5. row "4001" ->     kv from scanner 5
>>>>>> 6. row "5002" ->     kv from scanner 6
>>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>>
>>>>>> Notice how you always only have a list with N elements on the client
>>>>>> side, each representing the next value the scanners offer. Since the
>>>>>> list is sorted you always access item #1 and therefore the next in the
>>>>>> entire key space.
>>>>>>
>>>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>>>> then give you values from scanner 2 as the first elements in it. And
>>>>>> so on.
>>>>>>
>>>>>> Makes more sense?
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>>>> <er...@u-mangate.com>     wrote:
>>>>>>
>>>>>>  Hi Lars,
>>>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>>>> I can find python scripts, but no jruby one...
>>>>>>>
>>>>>>> Aside the hash function I could reuse, are you saying that range
>>>>>>> queries
>>>>>>> are
>>>>>>> possible even with hashed keys (randomly distributed)?
>>>>>>> (If possible with the script, it will also be possible from the hbase
>>>>>>> java
>>>>>>> client).
>>>>>>> Even with your explanation, I can't figure out how compound keys
>>>>>>> (hasedkey+key) can be range-queried.
>>>>>>>
>>>>>>> Tks,
>>>>>>> - Eric
>>>>>>>
>>>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>>>
>>>>>>>  Hi Eric,
>>>>>>>>
>>>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>>>> leading hashes to distribute them across servers. When you want to
>>>>>>>> do
>>>>>>>> scans you need to create N scans, where N is the number of hashes
>>>>>>>> and
>>>>>>>> then do a next() on each scanner, putting all KVs into one sorted
>>>>>>>> list
>>>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>>>> first. You can then access the rows in sorted order where the first
>>>>>>>> element in the list is the one with the first key to read. Once you
>>>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>>>
>>>>>>>> The shell is written in JRuby, so any function you can use there
>>>>>>>> would
>>>>>>>> make sense to use in the prefix, then you could compute it on the
>>>>>>>> fly.
>>>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>>>> you could write that code in Ruby and add it to you local shell
>>>>>>>> giving
>>>>>>>> you what you need.
>>>>>>>>
>>>>>>>> Lars
>>>>>>>>
>>>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>>>> <er...@u-mangate.com>       wrote:
>>>>>>>>
>>>>>>>>  Oops, forget my first question about range query (if keys are
>>>>>>>>> hashed,
>>>>>>>>> they
>>>>>>>>> can not be queried based on a range...)
>>>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>>>> advice
>>>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>>>> Tks,
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>>>
>>>>>>>>>  Hi,
>>>>>>>>>>
>>>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>>>> tables.
>>>>>>>>>>
>>>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>>>> key)
>>>>>>>>>> use case, because the rows will be randomly distributed in
>>>>>>>>>> different
>>>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>>>> 2. Is it possible to query from hbase shell with something like
>>>>>>>>>> "get
>>>>>>>>>> 't1',
>>>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>>>> readable
>>>>>>>>>> key.
>>>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>>>> would
>>>>>>>>>> you
>>>>>>>>>> advice? what about SHA1?
>>>>>>>>>>
>>>>>>>>>> Tks,
>>>>>>>>>> - Eric
>>>>>>>>>>
>>>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>
>>>
>>

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Hi Alex,

Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase" 
thread. Tks for this :) - I will need some more time to test it (quite 
busy atm).

Why did you declare 0.89.20100924. and not 0.91 (or 92) as HBaseWD 
dependency?

Tks,
- Eric


On 21/04/2011 15:56, Alex Baranau wrote:
> For those who are looking for the solution to this or similar issue, this
> can be useful:
>
> Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
> implements solution close to what Lars described.
> Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO
>
> Alex Baranau
> ----
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
>
> On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
> <er...@u-mangate.com>wrote:
>
>> Hi Lars,
>> Many tks for your reply.
>>
>> For now, I just rely on random or hashed keys and don't need any range
>> queries.
>> I will have to choose a nice solution one day for ordered keys upon which I
>> will range-query.
>>
>> I will post the results of the different data models I will try (looking
>> for other threads/articles to have a better view of the options).
>>
>> Tks,
>> - Eric
>>
>>
>> On 16/03/2011 23:02, Lars George wrote:
>>
>>> Hi Eric,
>>>
>>> Oops, you are right, my example was not clear and actually confusing
>>> the keys with sequential ones. The hash should map every Nth row key
>>> to the same bucket, so that you would for example see an interleaved
>>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>>> big issue. And yes, this is currently all client side driven :(
>>>
>>> Lars
>>>
>>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>>> <er...@u-mangate.com>   wrote:
>>>
>>>> Hi Lars,
>>>> Many tks for your explanations!
>>>>
>>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>>> different cases (just rephrasing what you said to be sure I get it):
>>>>
>>>> - Keys are really random (GUID or whatever): you have the distribution
>>>> for
>>>> free, still can't do, and probably don't need, range-queries.
>>>>
>>>> - If keys are monotonically increasing (timestamp, autoincremented,...),
>>>> there are two cases:
>>>> 1) sometimes, you don't need to do some range-queries and can store the
>>>> key
>>>> as a real hash (md5,...) to have distribution.
>>>> 2) For timebased series for example, you may need to do some range
>>>> queries,
>>>> and adding a salt can be an answer to combine best-of-world.
>>>>
>>>> I understand the "salt" approach as recreating on the client side
>>>> "artifical" key spaces.
>>>>
>>>> I was first confused reading "row 1...1000 ->   prefix h1_".
>>>> To really make the distribution random, I would have seen prefix/salt
>>>> attributed randomly for a key leading to for example a h1 keyspace as
>>>> such:
>>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>>
>>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>>> going
>>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>>> In that case, if you look for keys in "hour 1", you would only need one
>>>> scanner cause you know that they reside in "h1_", and you could query
>>>> with
>>>> scan(h1_time1, h1_time2).
>>>>
>>>> But at at time, as you describe, you may need to scan different buckets
>>>> with
>>>> different scanners and use an ordered list to contain the result.
>>>> - What about performance in that case? for very large dataset, a range
>>>> query
>>>> will take much time. I can imagine async client at the rescue. Maybe also
>>>> mapreduce jobs could help cause if will benefit from data locality.
>>>> - Also, the client application must manage the salts: it's a bit like
>>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>>> client carry on this layer. The client will have to store (in hbase :))
>>>> the
>>>> mapping between key ranges and their salt prefixes. It's a bit like
>>>> exporting some core? functionality to the client.
>>>>
>>>> Strange, I fell I missed your point :)
>>>> Tks,
>>>>
>>>> - Eric
>>>>
>>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>>> (should
>>>> get used to, cause I just learned another scripting language running on
>>>> jvm
>>>> for another project...)
>>>>
>>>>
>>>> On 16/03/2011 13:00, Lars George wrote:
>>>>
>>>>> Hi Eric,
>>>>>
>>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>>> it from the command line or so.
>>>>>
>>>>> And yes, you can range scan using a prefix. We were discussing this
>>>>> recently and there is this notion of design for reads, or design for
>>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>>> tough to find common grounds as both designs are on the far end of the
>>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>>> approach, which gives you distribution but still being able to scan...
>>>>> but not without some client side support. One typical class of data is
>>>>> timeseries based keys. As for scanning them, you need N client side
>>>>> scanners. Imagine this example:
>>>>>
>>>>> row       1 ... 1000 ->     Prefix "h1_"
>>>>> row 1001 ... 2000 ->     Prefix "h2_"
>>>>> row 2001 ... 3000 ->     Prefix "h3_"
>>>>> row 3001 ... 4000 ->     Prefix "h4_"
>>>>> row 4001 ... 5000 ->     Prefix "h5_"
>>>>> row 5001 ... 6000 ->     Prefix "h6_"
>>>>> row 6001 ... 7000 ->     Prefix "h7_"
>>>>>
>>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>>> (also sometimes called salt) are used to distribute them row keys to
>>>>> region servers. To scan the entire range as one large key space you
>>>>> need to create 7 scanners:
>>>>>
>>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>>> 7. scanner: start row: "h7_", end row ""
>>>>>
>>>>> Now each of them gives you the first row that matches the start and
>>>>> end row keys they are configure for. So you then take that first KV
>>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>>> hold something like:
>>>>>
>>>>> 1. row "1" ->     kv from scanner 1
>>>>> 2. row "1010" ->     kv from scanner 2
>>>>> 3. row "2001" ->     kv from scanner 3
>>>>> 4. row "3033" ->     kv from scanner 4
>>>>> 5. row "4001" ->     kv from scanner 5
>>>>> 6. row "5002" ->     kv from scanner 6
>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>
>>>>> (assuming that the keys are not contiguous but have gaps)
>>>>>
>>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>>> offering. Then insert that into the list and you get
>>>>>
>>>>> 1. row "3" ->     kv from scanner 1
>>>>> 2. row "1010" ->     kv from scanner 2
>>>>> 3. row "2001" ->     kv from scanner 3
>>>>> 4. row "3033" ->     kv from scanner 4
>>>>> 5. row "4001" ->     kv from scanner 5
>>>>> 6. row "5002" ->     kv from scanner 6
>>>>> 7. row "6000" ->     kv from scanner 7
>>>>>
>>>>> Notice how you always only have a list with N elements on the client
>>>>> side, each representing the next value the scanners offer. Since the
>>>>> list is sorted you always access item #1 and therefore the next in the
>>>>> entire key space.
>>>>>
>>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>>> then give you values from scanner 2 as the first elements in it. And
>>>>> so on.
>>>>>
>>>>> Makes more sense?
>>>>>
>>>>> Lars
>>>>>
>>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>>> <er...@u-mangate.com>     wrote:
>>>>>
>>>>>> Hi Lars,
>>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>>> I can find python scripts, but no jruby one...
>>>>>>
>>>>>> Aside the hash function I could reuse, are you saying that range
>>>>>> queries
>>>>>> are
>>>>>> possible even with hashed keys (randomly distributed)?
>>>>>> (If possible with the script, it will also be possible from the hbase
>>>>>> java
>>>>>> client).
>>>>>> Even with your explanation, I can't figure out how compound keys
>>>>>> (hasedkey+key) can be range-queried.
>>>>>>
>>>>>> Tks,
>>>>>> - Eric
>>>>>>
>>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>>
>>>>>>> Hi Eric,
>>>>>>>
>>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>>> leading hashes to distribute them across servers. When you want to do
>>>>>>> scans you need to create N scans, where N is the number of hashes and
>>>>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>>> first. You can then access the rows in sorted order where the first
>>>>>>> element in the list is the one with the first key to read. Once you
>>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>>
>>>>>>> The shell is written in JRuby, so any function you can use there would
>>>>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>>> you could write that code in Ruby and add it to you local shell giving
>>>>>>> you what you need.
>>>>>>>
>>>>>>> Lars
>>>>>>>
>>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>>> <er...@u-mangate.com>       wrote:
>>>>>>>
>>>>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>>>>> they
>>>>>>>> can not be queried based on a range...)
>>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>>> advice
>>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>>> Tks,
>>>>>>>> Eric
>>>>>>>>
>>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>>> tables.
>>>>>>>>>
>>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>>> key)
>>>>>>>>> use case, because the rows will be randomly distributed in different
>>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>>>>> 't1',
>>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>>> readable
>>>>>>>>> key.
>>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>>> would
>>>>>>>>> you
>>>>>>>>> advice? what about SHA1?
>>>>>>>>>
>>>>>>>>> Tks,
>>>>>>>>> - Eric
>>>>>>>>>
>>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>>
>>>>>>>>>
>>>>
>>
>

Re: Hash keys

Posted by Alex Baranau <al...@gmail.com>.
For those who are looking for the solution to this or similar issue, this
can be useful:

Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
implements solution close to what Lars described.
Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO

Alex Baranau
----
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase

On Thu, Mar 17, 2011 at 11:52 AM, Eric Charles
<er...@u-mangate.com>wrote:

> Hi Lars,
> Many tks for your reply.
>
> For now, I just rely on random or hashed keys and don't need any range
> queries.
> I will have to choose a nice solution one day for ordered keys upon which I
> will range-query.
>
> I will post the results of the different data models I will try (looking
> for other threads/articles to have a better view of the options).
>
> Tks,
> - Eric
>
>
> On 16/03/2011 23:02, Lars George wrote:
>
>> Hi Eric,
>>
>> Oops, you are right, my example was not clear and actually confusing
>> the keys with sequential ones. The hash should map every Nth row key
>> to the same bucket, so that you would for example see an interleaved
>> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
>> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
>> big issue. And yes, this is currently all client side driven :(
>>
>> Lars
>>
>> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
>> <er...@u-mangate.com>  wrote:
>>
>>> Hi Lars,
>>> Many tks for your explanations!
>>>
>>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>>> different cases (just rephrasing what you said to be sure I get it):
>>>
>>> - Keys are really random (GUID or whatever): you have the distribution
>>> for
>>> free, still can't do, and probably don't need, range-queries.
>>>
>>> - If keys are monotonically increasing (timestamp, autoincremented,...),
>>> there are two cases:
>>> 1) sometimes, you don't need to do some range-queries and can store the
>>> key
>>> as a real hash (md5,...) to have distribution.
>>> 2) For timebased series for example, you may need to do some range
>>> queries,
>>> and adding a salt can be an answer to combine best-of-world.
>>>
>>> I understand the "salt" approach as recreating on the client side
>>> "artifical" key spaces.
>>>
>>> I was first confused reading "row 1...1000 ->  prefix h1_".
>>> To really make the distribution random, I would have seen prefix/salt
>>> attributed randomly for a key leading to for example a h1 keyspace as
>>> such:
>>> h1_key2032, h1_key0023, h1_key1014343, ...
>>>
>>> Maybe you meant the intermediate approach where time keys of "hour 1"
>>> going
>>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>>> In that case, if you look for keys in "hour 1", you would only need one
>>> scanner cause you know that they reside in "h1_", and you could query
>>> with
>>> scan(h1_time1, h1_time2).
>>>
>>> But at at time, as you describe, you may need to scan different buckets
>>> with
>>> different scanners and use an ordered list to contain the result.
>>> - What about performance in that case? for very large dataset, a range
>>> query
>>> will take much time. I can imagine async client at the rescue. Maybe also
>>> mapreduce jobs could help cause if will benefit from data locality.
>>> - Also, the client application must manage the salts: it's a bit like
>>> reinventing a "salt" layer on top of the hbase region servers, letting
>>> client carry on this layer. The client will have to store (in hbase :))
>>> the
>>> mapping between key ranges and their salt prefixes. It's a bit like
>>> exporting some core? functionality to the client.
>>>
>>> Strange, I fell I missed your point :)
>>> Tks,
>>>
>>> - Eric
>>>
>>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff
>>> (should
>>> get used to, cause I just learned another scripting language running on
>>> jvm
>>> for another project...)
>>>
>>>
>>> On 16/03/2011 13:00, Lars George wrote:
>>>
>>>> Hi Eric,
>>>>
>>>> Socorro is Java and Python, I was just mentioning it as a possible
>>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>>> it is easy... *cough*) or write that same in a small Java app and use
>>>> it from the command line or so.
>>>>
>>>> And yes, you can range scan using a prefix. We were discussing this
>>>> recently and there is this notion of design for reads, or design for
>>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>>> tough to find common grounds as both designs are on the far end of the
>>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>>> approach, which gives you distribution but still being able to scan...
>>>> but not without some client side support. One typical class of data is
>>>> timeseries based keys. As for scanning them, you need N client side
>>>> scanners. Imagine this example:
>>>>
>>>> row       1 ... 1000 ->    Prefix "h1_"
>>>> row 1001 ... 2000 ->    Prefix "h2_"
>>>> row 2001 ... 3000 ->    Prefix "h3_"
>>>> row 3001 ... 4000 ->    Prefix "h4_"
>>>> row 4001 ... 5000 ->    Prefix "h5_"
>>>> row 5001 ... 6000 ->    Prefix "h6_"
>>>> row 6001 ... 7000 ->    Prefix "h7_"
>>>>
>>>> So you have divided the entire range into 7 buckets. The prefixes
>>>> (also sometimes called salt) are used to distribute them row keys to
>>>> region servers. To scan the entire range as one large key space you
>>>> need to create 7 scanners:
>>>>
>>>> 1. scanner: start row: "h1_", end row "h2_"
>>>> 2. scanner: start row: "h2_", end row "h3_"
>>>> 3. scanner: start row: "h3_", end row "h4_"
>>>> 4. scanner: start row: "h4_", end row "h5_"
>>>> 5. scanner: start row: "h5_", end row "h6_"
>>>> 6. scanner: start row: "h6_", end row "h7_"
>>>> 7. scanner: start row: "h7_", end row ""
>>>>
>>>> Now each of them gives you the first row that matches the start and
>>>> end row keys they are configure for. So you then take that first KV
>>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>>> then split and drop the prefix "h1_" to get "1". The list then would
>>>> hold something like:
>>>>
>>>> 1. row "1" ->    kv from scanner 1
>>>> 2. row "1010" ->    kv from scanner 2
>>>> 3. row "2001" ->    kv from scanner 3
>>>> 4. row "3033" ->    kv from scanner 4
>>>> 5. row "4001" ->    kv from scanner 5
>>>> 6. row "5002" ->    kv from scanner 6
>>>> 7. row "6000" ->    kv from scanner 7
>>>>
>>>> (assuming that the keys are not contiguous but have gaps)
>>>>
>>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>>> offering. Then insert that into the list and you get
>>>>
>>>> 1. row "3" ->    kv from scanner 1
>>>> 2. row "1010" ->    kv from scanner 2
>>>> 3. row "2001" ->    kv from scanner 3
>>>> 4. row "3033" ->    kv from scanner 4
>>>> 5. row "4001" ->    kv from scanner 5
>>>> 6. row "5002" ->    kv from scanner 6
>>>> 7. row "6000" ->    kv from scanner 7
>>>>
>>>> Notice how you always only have a list with N elements on the client
>>>> side, each representing the next value the scanners offer. Since the
>>>> list is sorted you always access item #1 and therefore the next in the
>>>> entire key space.
>>>>
>>>> Once scanner 1 runs out you can close and remove it, the list will
>>>> then give you values from scanner 2 as the first elements in it. And
>>>> so on.
>>>>
>>>> Makes more sense?
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>>> <er...@u-mangate.com>    wrote:
>>>>
>>>>> Hi Lars,
>>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>>> I can find python scripts, but no jruby one...
>>>>>
>>>>> Aside the hash function I could reuse, are you saying that range
>>>>> queries
>>>>> are
>>>>> possible even with hashed keys (randomly distributed)?
>>>>> (If possible with the script, it will also be possible from the hbase
>>>>> java
>>>>> client).
>>>>> Even with your explanation, I can't figure out how compound keys
>>>>> (hasedkey+key) can be range-queried.
>>>>>
>>>>> Tks,
>>>>> - Eric
>>>>>
>>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>>
>>>>>> Hi Eric,
>>>>>>
>>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>>> leading hashes to distribute them across servers. When you want to do
>>>>>> scans you need to create N scans, where N is the number of hashes and
>>>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>>> first. You can then access the rows in sorted order where the first
>>>>>> element in the list is the one with the first key to read. Once you
>>>>>> took of the first element (being the lowest KV key) you next the
>>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>>> keep taking from the top and therefore always see the entire range,
>>>>>> even if the same scanner would return the next logical rows to read.
>>>>>>
>>>>>> The shell is written in JRuby, so any function you can use there would
>>>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>>> do this with the above approach in code. Though since this is JRuby
>>>>>> you could write that code in Ruby and add it to you local shell giving
>>>>>> you what you need.
>>>>>>
>>>>>> Lars
>>>>>>
>>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>>> <er...@u-mangate.com>      wrote:
>>>>>>
>>>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>>>> they
>>>>>>> can not be queried based on a range...)
>>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>>> advice
>>>>>>> on md5/jenkins/sha1 (3.)
>>>>>>> Tks,
>>>>>>> Eric
>>>>>>>
>>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>>> tables.
>>>>>>>>
>>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>>> key)
>>>>>>>> use case, because the rows will be randomly distributed in different
>>>>>>>> regions. Will it cause some performance loose?
>>>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>>>> 't1',
>>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>>> readable
>>>>>>>> key.
>>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What
>>>>>>>> would
>>>>>>>> you
>>>>>>>> advice? what about SHA1?
>>>>>>>>
>>>>>>>> Tks,
>>>>>>>> - Eric
>>>>>>>>
>>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>>
>>>>>>>>
>>>
>

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Hi Lars,
Many tks for your reply.

For now, I just rely on random or hashed keys and don't need any range 
queries.
I will have to choose a nice solution one day for ordered keys upon 
which I will range-query.

I will post the results of the different data models I will try (looking 
for other threads/articles to have a better view of the options).

Tks,
- Eric

On 16/03/2011 23:02, Lars George wrote:
> Hi Eric,
>
> Oops, you are right, my example was not clear and actually confusing
> the keys with sequential ones. The hash should map every Nth row key
> to the same bucket, so that you would for example see an interleaved
> distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
> region 2 holds 2, 9, 16,... and so on. I do not think performance is a
> big issue. And yes, this is currently all client side driven :(
>
> Lars
>
> On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
> <er...@u-mangate.com>  wrote:
>> Hi Lars,
>> Many tks for your explanations!
>>
>> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
>> different cases (just rephrasing what you said to be sure I get it):
>>
>> - Keys are really random (GUID or whatever): you have the distribution for
>> free, still can't do, and probably don't need, range-queries.
>>
>> - If keys are monotonically increasing (timestamp, autoincremented,...),
>> there are two cases:
>> 1) sometimes, you don't need to do some range-queries and can store the key
>> as a real hash (md5,...) to have distribution.
>> 2) For timebased series for example, you may need to do some range queries,
>> and adding a salt can be an answer to combine best-of-world.
>>
>> I understand the "salt" approach as recreating on the client side
>> "artifical" key spaces.
>>
>> I was first confused reading "row 1...1000 ->  prefix h1_".
>> To really make the distribution random, I would have seen prefix/salt
>> attributed randomly for a key leading to for example a h1 keyspace as such:
>> h1_key2032, h1_key0023, h1_key1014343, ...
>>
>> Maybe you meant the intermediate approach where time keys of "hour 1" going
>> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
>> In that case, if you look for keys in "hour 1", you would only need one
>> scanner cause you know that they reside in "h1_", and you could query with
>> scan(h1_time1, h1_time2).
>>
>> But at at time, as you describe, you may need to scan different buckets with
>> different scanners and use an ordered list to contain the result.
>> - What about performance in that case? for very large dataset, a range query
>> will take much time. I can imagine async client at the rescue. Maybe also
>> mapreduce jobs could help cause if will benefit from data locality.
>> - Also, the client application must manage the salts: it's a bit like
>> reinventing a "salt" layer on top of the hbase region servers, letting
>> client carry on this layer. The client will have to store (in hbase :)) the
>> mapping between key ranges and their salt prefixes. It's a bit like
>> exporting some core? functionality to the client.
>>
>> Strange, I fell I missed your point :)
>> Tks,
>>
>> - Eric
>>
>> Sidenote: ...and yes, it seems I will have to learn some ruby stuff (should
>> get used to, cause I just learned another scripting language running on jvm
>> for another project...)
>>
>>
>> On 16/03/2011 13:00, Lars George wrote:
>>> Hi Eric,
>>>
>>> Socorro is Java and Python, I was just mentioning it as a possible
>>> source of inspiration :) You can learn Ruby and implement it (I hear
>>> it is easy... *cough*) or write that same in a small Java app and use
>>> it from the command line or so.
>>>
>>> And yes, you can range scan using a prefix. We were discussing this
>>> recently and there is this notion of design for reads, or design for
>>> writes. DFR is usually sequential keys and DFW is random keys. It is
>>> tough to find common grounds as both designs are on the far end of the
>>> same spectrum. Finding a middle ground is the bucketed (or salted)
>>> approach, which gives you distribution but still being able to scan...
>>> but not without some client side support. One typical class of data is
>>> timeseries based keys. As for scanning them, you need N client side
>>> scanners. Imagine this example:
>>>
>>> row       1 ... 1000 ->    Prefix "h1_"
>>> row 1001 ... 2000 ->    Prefix "h2_"
>>> row 2001 ... 3000 ->    Prefix "h3_"
>>> row 3001 ... 4000 ->    Prefix "h4_"
>>> row 4001 ... 5000 ->    Prefix "h5_"
>>> row 5001 ... 6000 ->    Prefix "h6_"
>>> row 6001 ... 7000 ->    Prefix "h7_"
>>>
>>> So you have divided the entire range into 7 buckets. The prefixes
>>> (also sometimes called salt) are used to distribute them row keys to
>>> region servers. To scan the entire range as one large key space you
>>> need to create 7 scanners:
>>>
>>> 1. scanner: start row: "h1_", end row "h2_"
>>> 2. scanner: start row: "h2_", end row "h3_"
>>> 3. scanner: start row: "h3_", end row "h4_"
>>> 4. scanner: start row: "h4_", end row "h5_"
>>> 5. scanner: start row: "h5_", end row "h6_"
>>> 6. scanner: start row: "h6_", end row "h7_"
>>> 7. scanner: start row: "h7_", end row ""
>>>
>>> Now each of them gives you the first row that matches the start and
>>> end row keys they are configure for. So you then take that first KV
>>> they offer and add it to a list, sorted by ky.getRow() while removing
>>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>>> then split and drop the prefix "h1_" to get "1". The list then would
>>> hold something like:
>>>
>>> 1. row "1" ->    kv from scanner 1
>>> 2. row "1010" ->    kv from scanner 2
>>> 3. row "2001" ->    kv from scanner 3
>>> 4. row "3033" ->    kv from scanner 4
>>> 5. row "4001" ->    kv from scanner 5
>>> 6. row "5002" ->    kv from scanner 6
>>> 7. row "6000" ->    kv from scanner 7
>>>
>>> (assuming that the keys are not contiguous but have gaps)
>>>
>>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>>> offering. Then insert that into the list and you get
>>>
>>> 1. row "3" ->    kv from scanner 1
>>> 2. row "1010" ->    kv from scanner 2
>>> 3. row "2001" ->    kv from scanner 3
>>> 4. row "3033" ->    kv from scanner 4
>>> 5. row "4001" ->    kv from scanner 5
>>> 6. row "5002" ->    kv from scanner 6
>>> 7. row "6000" ->    kv from scanner 7
>>>
>>> Notice how you always only have a list with N elements on the client
>>> side, each representing the next value the scanners offer. Since the
>>> list is sorted you always access item #1 and therefore the next in the
>>> entire key space.
>>>
>>> Once scanner 1 runs out you can close and remove it, the list will
>>> then give you values from scanner 2 as the first elements in it. And
>>> so on.
>>>
>>> Makes more sense?
>>>
>>> Lars
>>>
>>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>>> <er...@u-mangate.com>    wrote:
>>>> Hi Lars,
>>>> Are you talking about http://code.google.com/p/socorro/ ?
>>>> I can find python scripts, but no jruby one...
>>>>
>>>> Aside the hash function I could reuse, are you saying that range queries
>>>> are
>>>> possible even with hashed keys (randomly distributed)?
>>>> (If possible with the script, it will also be possible from the hbase
>>>> java
>>>> client).
>>>> Even with your explanation, I can't figure out how compound keys
>>>> (hasedkey+key) can be range-queried.
>>>>
>>>> Tks,
>>>> - Eric
>>>>
>>>> On 16/03/2011 11:38, Lars George wrote:
>>>>> Hi Eric,
>>>>>
>>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>>> leading hashes to distribute them across servers. When you want to do
>>>>> scans you need to create N scans, where N is the number of hashes and
>>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>>> first. You can then access the rows in sorted order where the first
>>>>> element in the list is the one with the first key to read. Once you
>>>>> took of the first element (being the lowest KV key) you next the
>>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>>> keep taking from the top and therefore always see the entire range,
>>>>> even if the same scanner would return the next logical rows to read.
>>>>>
>>>>> The shell is written in JRuby, so any function you can use there would
>>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>>> This will not help with merging the bucketed key ranges, you need to
>>>>> do this with the above approach in code. Though since this is JRuby
>>>>> you could write that code in Ruby and add it to you local shell giving
>>>>> you what you need.
>>>>>
>>>>> Lars
>>>>>
>>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>>> <er...@u-mangate.com>      wrote:
>>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>>> they
>>>>>> can not be queried based on a range...)
>>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>>> advice
>>>>>> on md5/jenkins/sha1 (3.)
>>>>>> Tks,
>>>>>> Eric
>>>>>>
>>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>>> tables.
>>>>>>>
>>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>>> key)
>>>>>>> use case, because the rows will be randomly distributed in different
>>>>>>> regions. Will it cause some performance loose?
>>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>>> 't1',
>>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>>> readable
>>>>>>> key.
>>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would
>>>>>>> you
>>>>>>> advice? what about SHA1?
>>>>>>>
>>>>>>> Tks,
>>>>>>> - Eric
>>>>>>>
>>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>>
>>


Re: Hash keys

Posted by Lars George <la...@gmail.com>.
Hi Eric,

Oops, you are right, my example was not clear and actually confusing
the keys with sequential ones. The hash should map every Nth row key
to the same bucket, so that you would for example see an interleaved
distribution of row keys to regions. Region 1 holds 1, 8, 15,... while
region 2 holds 2, 9, 16,... and so on. I do not think performance is a
big issue. And yes, this is currently all client side driven :(

Lars

On Wed, Mar 16, 2011 at 2:57 PM, Eric Charles
<er...@u-mangate.com> wrote:
> Hi Lars,
> Many tks for your explanations!
>
> About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine
> different cases (just rephrasing what you said to be sure I get it):
>
> - Keys are really random (GUID or whatever): you have the distribution for
> free, still can't do, and probably don't need, range-queries.
>
> - If keys are monotonically increasing (timestamp, autoincremented,...),
> there are two cases:
> 1) sometimes, you don't need to do some range-queries and can store the key
> as a real hash (md5,...) to have distribution.
> 2) For timebased series for example, you may need to do some range queries,
> and adding a salt can be an answer to combine best-of-world.
>
> I understand the "salt" approach as recreating on the client side
> "artifical" key spaces.
>
> I was first confused reading "row 1...1000 -> prefix h1_".
> To really make the distribution random, I would have seen prefix/salt
> attributed randomly for a key leading to for example a h1 keyspace as such:
> h1_key2032, h1_key0023, h1_key1014343, ...
>
> Maybe you meant the intermediate approach where time keys of "hour 1" going
> to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
> In that case, if you look for keys in "hour 1", you would only need one
> scanner cause you know that they reside in "h1_", and you could query with
> scan(h1_time1, h1_time2).
>
> But at at time, as you describe, you may need to scan different buckets with
> different scanners and use an ordered list to contain the result.
> - What about performance in that case? for very large dataset, a range query
> will take much time. I can imagine async client at the rescue. Maybe also
> mapreduce jobs could help cause if will benefit from data locality.
> - Also, the client application must manage the salts: it's a bit like
> reinventing a "salt" layer on top of the hbase region servers, letting
> client carry on this layer. The client will have to store (in hbase :)) the
> mapping between key ranges and their salt prefixes. It's a bit like
> exporting some core? functionality to the client.
>
> Strange, I fell I missed your point :)
> Tks,
>
> - Eric
>
> Sidenote: ...and yes, it seems I will have to learn some ruby stuff (should
> get used to, cause I just learned another scripting language running on jvm
> for another project...)
>
>
> On 16/03/2011 13:00, Lars George wrote:
>>
>> Hi Eric,
>>
>> Socorro is Java and Python, I was just mentioning it as a possible
>> source of inspiration :) You can learn Ruby and implement it (I hear
>> it is easy... *cough*) or write that same in a small Java app and use
>> it from the command line or so.
>>
>> And yes, you can range scan using a prefix. We were discussing this
>> recently and there is this notion of design for reads, or design for
>> writes. DFR is usually sequential keys and DFW is random keys. It is
>> tough to find common grounds as both designs are on the far end of the
>> same spectrum. Finding a middle ground is the bucketed (or salted)
>> approach, which gives you distribution but still being able to scan...
>> but not without some client side support. One typical class of data is
>> timeseries based keys. As for scanning them, you need N client side
>> scanners. Imagine this example:
>>
>> row       1 ... 1000 ->  Prefix "h1_"
>> row 1001 ... 2000 ->  Prefix "h2_"
>> row 2001 ... 3000 ->  Prefix "h3_"
>> row 3001 ... 4000 ->  Prefix "h4_"
>> row 4001 ... 5000 ->  Prefix "h5_"
>> row 5001 ... 6000 ->  Prefix "h6_"
>> row 6001 ... 7000 ->  Prefix "h7_"
>>
>> So you have divided the entire range into 7 buckets. The prefixes
>> (also sometimes called salt) are used to distribute them row keys to
>> region servers. To scan the entire range as one large key space you
>> need to create 7 scanners:
>>
>> 1. scanner: start row: "h1_", end row "h2_"
>> 2. scanner: start row: "h2_", end row "h3_"
>> 3. scanner: start row: "h3_", end row "h4_"
>> 4. scanner: start row: "h4_", end row "h5_"
>> 5. scanner: start row: "h5_", end row "h6_"
>> 6. scanner: start row: "h6_", end row "h7_"
>> 7. scanner: start row: "h7_", end row ""
>>
>> Now each of them gives you the first row that matches the start and
>> end row keys they are configure for. So you then take that first KV
>> they offer and add it to a list, sorted by ky.getRow() while removing
>> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
>> then split and drop the prefix "h1_" to get "1". The list then would
>> hold something like:
>>
>> 1. row "1" ->  kv from scanner 1
>> 2. row "1010" ->  kv from scanner 2
>> 3. row "2001" ->  kv from scanner 3
>> 4. row "3033" ->  kv from scanner 4
>> 5. row "4001" ->  kv from scanner 5
>> 6. row "5002" ->  kv from scanner 6
>> 7. row "6000" ->  kv from scanner 7
>>
>> (assuming that the keys are not contiguous but have gaps)
>>
>> You then pop element #1 and do a "scanner1.next()" to get its next KV
>> offering. Then insert that into the list and you get
>>
>> 1. row "3" ->  kv from scanner 1
>> 2. row "1010" ->  kv from scanner 2
>> 3. row "2001" ->  kv from scanner 3
>> 4. row "3033" ->  kv from scanner 4
>> 5. row "4001" ->  kv from scanner 5
>> 6. row "5002" ->  kv from scanner 6
>> 7. row "6000" ->  kv from scanner 7
>>
>> Notice how you always only have a list with N elements on the client
>> side, each representing the next value the scanners offer. Since the
>> list is sorted you always access item #1 and therefore the next in the
>> entire key space.
>>
>> Once scanner 1 runs out you can close and remove it, the list will
>> then give you values from scanner 2 as the first elements in it. And
>> so on.
>>
>> Makes more sense?
>>
>> Lars
>>
>> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
>> <er...@u-mangate.com>  wrote:
>>>
>>> Hi Lars,
>>> Are you talking about http://code.google.com/p/socorro/ ?
>>> I can find python scripts, but no jruby one...
>>>
>>> Aside the hash function I could reuse, are you saying that range queries
>>> are
>>> possible even with hashed keys (randomly distributed)?
>>> (If possible with the script, it will also be possible from the hbase
>>> java
>>> client).
>>> Even with your explanation, I can't figure out how compound keys
>>> (hasedkey+key) can be range-queried.
>>>
>>> Tks,
>>> - Eric
>>>
>>> On 16/03/2011 11:38, Lars George wrote:
>>>>
>>>> Hi Eric,
>>>>
>>>> Mozilla Socorro uses an approach where they bucket ranges using
>>>> leading hashes to distribute them across servers. When you want to do
>>>> scans you need to create N scans, where N is the number of hashes and
>>>> then do a next() on each scanner, putting all KVs into one sorted list
>>>> (use the KeyComparator for example) while stripping the prefix hash
>>>> first. You can then access the rows in sorted order where the first
>>>> element in the list is the one with the first key to read. Once you
>>>> took of the first element (being the lowest KV key) you next the
>>>> underlying scanner and reinsert it into the list, reordering it. You
>>>> keep taking from the top and therefore always see the entire range,
>>>> even if the same scanner would return the next logical rows to read.
>>>>
>>>> The shell is written in JRuby, so any function you can use there would
>>>> make sense to use in the prefix, then you could compute it on the fly.
>>>> This will not help with merging the bucketed key ranges, you need to
>>>> do this with the above approach in code. Though since this is JRuby
>>>> you could write that code in Ruby and add it to you local shell giving
>>>> you what you need.
>>>>
>>>> Lars
>>>>
>>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>>> <er...@u-mangate.com>    wrote:
>>>>>
>>>>> Oops, forget my first question about range query (if keys are hashed,
>>>>> they
>>>>> can not be queried based on a range...)
>>>>> Still curious to have info on hash function in shell shell (2.) and
>>>>> advice
>>>>> on md5/jenkins/sha1 (3.)
>>>>> Tks,
>>>>> Eric
>>>>>
>>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> To help avoid hotspots, I'm planning to use hashed keys in some
>>>>>> tables.
>>>>>>
>>>>>> 1. I wonder if this strategy is adviced for range queries (from/to
>>>>>> key)
>>>>>> use case, because the rows will be randomly distributed in different
>>>>>> regions. Will it cause some performance loose?
>>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>>> 't1',
>>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>>> readable
>>>>>> key.
>>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would
>>>>>> you
>>>>>> advice? what about SHA1?
>>>>>>
>>>>>> Tks,
>>>>>> - Eric
>>>>>>
>>>>>> PS: I searched the archive but didn't find the answers.
>>>>>>
>>>
>
>

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Hi Lars,
Many tks for your explanations!

About DFR (sequential-keys) vs DFW (random-keys) distinction, I imagine 
different cases (just rephrasing what you said to be sure I get it):

- Keys are really random (GUID or whatever): you have the distribution 
for free, still can't do, and probably don't need, range-queries.

- If keys are monotonically increasing (timestamp, autoincremented,...), 
there are two cases:
1) sometimes, you don't need to do some range-queries and can store the 
key as a real hash (md5,...) to have distribution.
2) For timebased series for example, you may need to do some range 
queries, and adding a salt can be an answer to combine best-of-world.

I understand the "salt" approach as recreating on the client side 
"artifical" key spaces.

I was first confused reading "row 1...1000 -> prefix h1_".
To really make the distribution random, I would have seen prefix/salt 
attributed randomly for a key leading to for example a h1 keyspace as 
such: h1_key2032, h1_key0023, h1_key1014343, ...

Maybe you meant the intermediate approach where time keys of "hour 1" 
going to h1 keyspace, keys of "hour 2" going to h2 keyspace,...
In that case, if you look for keys in "hour 1", you would only need one 
scanner cause you know that they reside in "h1_", and you could query 
with scan(h1_time1, h1_time2).

But at at time, as you describe, you may need to scan different buckets 
with different scanners and use an ordered list to contain the result.
- What about performance in that case? for very large dataset, a range 
query will take much time. I can imagine async client at the rescue. 
Maybe also mapreduce jobs could help cause if will benefit from data 
locality.
- Also, the client application must manage the salts: it's a bit like 
reinventing a "salt" layer on top of the hbase region servers, letting 
client carry on this layer. The client will have to store (in hbase :)) 
the mapping between key ranges and their salt prefixes. It's a bit like 
exporting some core? functionality to the client.

Strange, I fell I missed your point :)
Tks,

- Eric

Sidenote: ...and yes, it seems I will have to learn some ruby stuff 
(should get used to, cause I just learned another scripting language 
running on jvm for another project...)


On 16/03/2011 13:00, Lars George wrote:
> Hi Eric,
>
> Socorro is Java and Python, I was just mentioning it as a possible
> source of inspiration :) You can learn Ruby and implement it (I hear
> it is easy... *cough*) or write that same in a small Java app and use
> it from the command line or so.
>
> And yes, you can range scan using a prefix. We were discussing this
> recently and there is this notion of design for reads, or design for
> writes. DFR is usually sequential keys and DFW is random keys. It is
> tough to find common grounds as both designs are on the far end of the
> same spectrum. Finding a middle ground is the bucketed (or salted)
> approach, which gives you distribution but still being able to scan...
> but not without some client side support. One typical class of data is
> timeseries based keys. As for scanning them, you need N client side
> scanners. Imagine this example:
>
> row       1 ... 1000 ->  Prefix "h1_"
> row 1001 ... 2000 ->  Prefix "h2_"
> row 2001 ... 3000 ->  Prefix "h3_"
> row 3001 ... 4000 ->  Prefix "h4_"
> row 4001 ... 5000 ->  Prefix "h5_"
> row 5001 ... 6000 ->  Prefix "h6_"
> row 6001 ... 7000 ->  Prefix "h7_"
>
> So you have divided the entire range into 7 buckets. The prefixes
> (also sometimes called salt) are used to distribute them row keys to
> region servers. To scan the entire range as one large key space you
> need to create 7 scanners:
>
> 1. scanner: start row: "h1_", end row "h2_"
> 2. scanner: start row: "h2_", end row "h3_"
> 3. scanner: start row: "h3_", end row "h4_"
> 4. scanner: start row: "h4_", end row "h5_"
> 5. scanner: start row: "h5_", end row "h6_"
> 6. scanner: start row: "h6_", end row "h7_"
> 7. scanner: start row: "h7_", end row ""
>
> Now each of them gives you the first row that matches the start and
> end row keys they are configure for. So you then take that first KV
> they offer and add it to a list, sorted by ky.getRow() while removing
> the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
> then split and drop the prefix "h1_" to get "1". The list then would
> hold something like:
>
> 1. row "1" ->  kv from scanner 1
> 2. row "1010" ->  kv from scanner 2
> 3. row "2001" ->  kv from scanner 3
> 4. row "3033" ->  kv from scanner 4
> 5. row "4001" ->  kv from scanner 5
> 6. row "5002" ->  kv from scanner 6
> 7. row "6000" ->  kv from scanner 7
>
> (assuming that the keys are not contiguous but have gaps)
>
> You then pop element #1 and do a "scanner1.next()" to get its next KV
> offering. Then insert that into the list and you get
>
> 1. row "3" ->  kv from scanner 1
> 2. row "1010" ->  kv from scanner 2
> 3. row "2001" ->  kv from scanner 3
> 4. row "3033" ->  kv from scanner 4
> 5. row "4001" ->  kv from scanner 5
> 6. row "5002" ->  kv from scanner 6
> 7. row "6000" ->  kv from scanner 7
>
> Notice how you always only have a list with N elements on the client
> side, each representing the next value the scanners offer. Since the
> list is sorted you always access item #1 and therefore the next in the
> entire key space.
>
> Once scanner 1 runs out you can close and remove it, the list will
> then give you values from scanner 2 as the first elements in it. And
> so on.
>
> Makes more sense?
>
> Lars
>
> On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
> <er...@u-mangate.com>  wrote:
>> Hi Lars,
>> Are you talking about http://code.google.com/p/socorro/ ?
>> I can find python scripts, but no jruby one...
>>
>> Aside the hash function I could reuse, are you saying that range queries are
>> possible even with hashed keys (randomly distributed)?
>> (If possible with the script, it will also be possible from the hbase java
>> client).
>> Even with your explanation, I can't figure out how compound keys
>> (hasedkey+key) can be range-queried.
>>
>> Tks,
>> - Eric
>>
>> On 16/03/2011 11:38, Lars George wrote:
>>> Hi Eric,
>>>
>>> Mozilla Socorro uses an approach where they bucket ranges using
>>> leading hashes to distribute them across servers. When you want to do
>>> scans you need to create N scans, where N is the number of hashes and
>>> then do a next() on each scanner, putting all KVs into one sorted list
>>> (use the KeyComparator for example) while stripping the prefix hash
>>> first. You can then access the rows in sorted order where the first
>>> element in the list is the one with the first key to read. Once you
>>> took of the first element (being the lowest KV key) you next the
>>> underlying scanner and reinsert it into the list, reordering it. You
>>> keep taking from the top and therefore always see the entire range,
>>> even if the same scanner would return the next logical rows to read.
>>>
>>> The shell is written in JRuby, so any function you can use there would
>>> make sense to use in the prefix, then you could compute it on the fly.
>>> This will not help with merging the bucketed key ranges, you need to
>>> do this with the above approach in code. Though since this is JRuby
>>> you could write that code in Ruby and add it to you local shell giving
>>> you what you need.
>>>
>>> Lars
>>>
>>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>>> <er...@u-mangate.com>    wrote:
>>>> Oops, forget my first question about range query (if keys are hashed,
>>>> they
>>>> can not be queried based on a range...)
>>>> Still curious to have info on hash function in shell shell (2.) and
>>>> advice
>>>> on md5/jenkins/sha1 (3.)
>>>> Tks,
>>>> Eric
>>>>
>>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>> Hi,
>>>>>
>>>>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>>>>
>>>>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>>>>> use case, because the rows will be randomly distributed in different
>>>>> regions. Will it cause some performance loose?
>>>>> 2. Is it possible to query from hbase shell with something like "get
>>>>> 't1',
>>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>>> readable
>>>>> key.
>>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would
>>>>> you
>>>>> advice? what about SHA1?
>>>>>
>>>>> Tks,
>>>>> - Eric
>>>>>
>>>>> PS: I searched the archive but didn't find the answers.
>>>>>
>>


Re: Hash keys

Posted by Lars George <la...@gmail.com>.
Hi Eric,

Socorro is Java and Python, I was just mentioning it as a possible
source of inspiration :) You can learn Ruby and implement it (I hear
it is easy... *cough*) or write that same in a small Java app and use
it from the command line or so.

And yes, you can range scan using a prefix. We were discussing this
recently and there is this notion of design for reads, or design for
writes. DFR is usually sequential keys and DFW is random keys. It is
tough to find common grounds as both designs are on the far end of the
same spectrum. Finding a middle ground is the bucketed (or salted)
approach, which gives you distribution but still being able to scan...
but not without some client side support. One typical class of data is
timeseries based keys. As for scanning them, you need N client side
scanners. Imagine this example:

row       1 ... 1000 -> Prefix "h1_"
row 1001 ... 2000 -> Prefix "h2_"
row 2001 ... 3000 -> Prefix "h3_"
row 3001 ... 4000 -> Prefix "h4_"
row 4001 ... 5000 -> Prefix "h5_"
row 5001 ... 6000 -> Prefix "h6_"
row 6001 ... 7000 -> Prefix "h7_"

So you have divided the entire range into 7 buckets. The prefixes
(also sometimes called salt) are used to distribute them row keys to
region servers. To scan the entire range as one large key space you
need to create 7 scanners:

1. scanner: start row: "h1_", end row "h2_"
2. scanner: start row: "h2_", end row "h3_"
3. scanner: start row: "h3_", end row "h4_"
4. scanner: start row: "h4_", end row "h5_"
5. scanner: start row: "h5_", end row "h6_"
6. scanner: start row: "h6_", end row "h7_"
7. scanner: start row: "h7_", end row ""

Now each of them gives you the first row that matches the start and
end row keys they are configure for. So you then take that first KV
they offer and add it to a list, sorted by ky.getRow() while removing
the hash prefix. For example, scanner 1 may have row "h1_1" to offer,
then split and drop the prefix "h1_" to get "1". The list then would
hold something like:

1. row "1" -> kv from scanner 1
2. row "1010" -> kv from scanner 2
3. row "2001" -> kv from scanner 3
4. row "3033" -> kv from scanner 4
5. row "4001" -> kv from scanner 5
6. row "5002" -> kv from scanner 6
7. row "6000" -> kv from scanner 7

(assuming that the keys are not contiguous but have gaps)

You then pop element #1 and do a "scanner1.next()" to get its next KV
offering. Then insert that into the list and you get

1. row "3" -> kv from scanner 1
2. row "1010" -> kv from scanner 2
3. row "2001" -> kv from scanner 3
4. row "3033" -> kv from scanner 4
5. row "4001" -> kv from scanner 5
6. row "5002" -> kv from scanner 6
7. row "6000" -> kv from scanner 7

Notice how you always only have a list with N elements on the client
side, each representing the next value the scanners offer. Since the
list is sorted you always access item #1 and therefore the next in the
entire key space.

Once scanner 1 runs out you can close and remove it, the list will
then give you values from scanner 2 as the first elements in it. And
so on.

Makes more sense?

Lars

On Wed, Mar 16, 2011 at 12:09 PM, Eric Charles
<er...@u-mangate.com> wrote:
> Hi Lars,
> Are you talking about http://code.google.com/p/socorro/ ?
> I can find python scripts, but no jruby one...
>
> Aside the hash function I could reuse, are you saying that range queries are
> possible even with hashed keys (randomly distributed)?
> (If possible with the script, it will also be possible from the hbase java
> client).
> Even with your explanation, I can't figure out how compound keys
> (hasedkey+key) can be range-queried.
>
> Tks,
> - Eric
>
> On 16/03/2011 11:38, Lars George wrote:
>>
>> Hi Eric,
>>
>> Mozilla Socorro uses an approach where they bucket ranges using
>> leading hashes to distribute them across servers. When you want to do
>> scans you need to create N scans, where N is the number of hashes and
>> then do a next() on each scanner, putting all KVs into one sorted list
>> (use the KeyComparator for example) while stripping the prefix hash
>> first. You can then access the rows in sorted order where the first
>> element in the list is the one with the first key to read. Once you
>> took of the first element (being the lowest KV key) you next the
>> underlying scanner and reinsert it into the list, reordering it. You
>> keep taking from the top and therefore always see the entire range,
>> even if the same scanner would return the next logical rows to read.
>>
>> The shell is written in JRuby, so any function you can use there would
>> make sense to use in the prefix, then you could compute it on the fly.
>> This will not help with merging the bucketed key ranges, you need to
>> do this with the above approach in code. Though since this is JRuby
>> you could write that code in Ruby and add it to you local shell giving
>> you what you need.
>>
>> Lars
>>
>> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
>> <er...@u-mangate.com>  wrote:
>>>
>>> Oops, forget my first question about range query (if keys are hashed,
>>> they
>>> can not be queried based on a range...)
>>> Still curious to have info on hash function in shell shell (2.) and
>>> advice
>>> on md5/jenkins/sha1 (3.)
>>> Tks,
>>> Eric
>>>
>>> On 16/03/2011 09:52, Eric Charles wrote:
>>>>
>>>> Hi,
>>>>
>>>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>>>
>>>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>>>> use case, because the rows will be randomly distributed in different
>>>> regions. Will it cause some performance loose?
>>>> 2. Is it possible to query from hbase shell with something like "get
>>>> 't1',
>>>> @hash('r1')", to let the shell compute the hash for you from the
>>>> readable
>>>> key.
>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would
>>>> you
>>>> advice? what about SHA1?
>>>>
>>>> Tks,
>>>> - Eric
>>>>
>>>> PS: I searched the archive but didn't find the answers.
>>>>
>>>
>
>

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Hi Lars,
Are you talking about http://code.google.com/p/socorro/ ?
I can find python scripts, but no jruby one...

Aside the hash function I could reuse, are you saying that range queries 
are possible even with hashed keys (randomly distributed)?
(If possible with the script, it will also be possible from the hbase 
java client).
Even with your explanation, I can't figure out how compound keys 
(hasedkey+key) can be range-queried.

Tks,
- Eric

On 16/03/2011 11:38, Lars George wrote:
> Hi Eric,
>
> Mozilla Socorro uses an approach where they bucket ranges using
> leading hashes to distribute them across servers. When you want to do
> scans you need to create N scans, where N is the number of hashes and
> then do a next() on each scanner, putting all KVs into one sorted list
> (use the KeyComparator for example) while stripping the prefix hash
> first. You can then access the rows in sorted order where the first
> element in the list is the one with the first key to read. Once you
> took of the first element (being the lowest KV key) you next the
> underlying scanner and reinsert it into the list, reordering it. You
> keep taking from the top and therefore always see the entire range,
> even if the same scanner would return the next logical rows to read.
>
> The shell is written in JRuby, so any function you can use there would
> make sense to use in the prefix, then you could compute it on the fly.
> This will not help with merging the bucketed key ranges, you need to
> do this with the above approach in code. Though since this is JRuby
> you could write that code in Ruby and add it to you local shell giving
> you what you need.
>
> Lars
>
> On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
> <er...@u-mangate.com>  wrote:
>> Oops, forget my first question about range query (if keys are hashed, they
>> can not be queried based on a range...)
>> Still curious to have info on hash function in shell shell (2.) and advice
>> on md5/jenkins/sha1 (3.)
>> Tks,
>> Eric
>>
>> On 16/03/2011 09:52, Eric Charles wrote:
>>> Hi,
>>>
>>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>>
>>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>>> use case, because the rows will be randomly distributed in different
>>> regions. Will it cause some performance loose?
>>> 2. Is it possible to query from hbase shell with something like "get 't1',
>>> @hash('r1')", to let the shell compute the hash for you from the readable
>>> key.
>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would you
>>> advice? what about SHA1?
>>>
>>> Tks,
>>> - Eric
>>>
>>> PS: I searched the archive but didn't find the answers.
>>>
>>


Re: Hash keys

Posted by Lars George <la...@gmail.com>.
Hi Eric,

Mozilla Socorro uses an approach where they bucket ranges using
leading hashes to distribute them across servers. When you want to do
scans you need to create N scans, where N is the number of hashes and
then do a next() on each scanner, putting all KVs into one sorted list
(use the KeyComparator for example) while stripping the prefix hash
first. You can then access the rows in sorted order where the first
element in the list is the one with the first key to read. Once you
took of the first element (being the lowest KV key) you next the
underlying scanner and reinsert it into the list, reordering it. You
keep taking from the top and therefore always see the entire range,
even if the same scanner would return the next logical rows to read.

The shell is written in JRuby, so any function you can use there would
make sense to use in the prefix, then you could compute it on the fly.
This will not help with merging the bucketed key ranges, you need to
do this with the above approach in code. Though since this is JRuby
you could write that code in Ruby and add it to you local shell giving
you what you need.

Lars

On Wed, Mar 16, 2011 at 9:01 AM, Eric Charles
<er...@u-mangate.com> wrote:
> Oops, forget my first question about range query (if keys are hashed, they
> can not be queried based on a range...)
> Still curious to have info on hash function in shell shell (2.) and advice
> on md5/jenkins/sha1 (3.)
> Tks,
> Eric
>
> On 16/03/2011 09:52, Eric Charles wrote:
>>
>> Hi,
>>
>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>
>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>> use case, because the rows will be randomly distributed in different
>> regions. Will it cause some performance loose?
>> 2. Is it possible to query from hbase shell with something like "get 't1',
>> @hash('r1')", to let the shell compute the hash for you from the readable
>> key.
>> 3. There are MD5 and Jenkins classes in hbase.util package. What would you
>> advice? what about SHA1?
>>
>> Tks,
>> - Eric
>>
>> PS: I searched the archive but didn't find the answers.
>>
>
>

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Oops, forget my first question about range query (if keys are hashed, 
they can not be queried based on a range...)
Still curious to have info on hash function in shell shell (2.) and 
advice on md5/jenkins/sha1 (3.)
Tks,
Eric

On 16/03/2011 09:52, Eric Charles wrote:
> Hi,
>
> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>
> 1. I wonder if this strategy is adviced for range queries (from/to 
> key) use case, because the rows will be randomly distributed in 
> different regions. Will it cause some performance loose?
> 2. Is it possible to query from hbase shell with something like "get 
> 't1', @hash('r1')", to let the shell compute the hash for you from the 
> readable key.
> 3. There are MD5 and Jenkins classes in hbase.util package. What would 
> you advice? what about SHA1?
>
> Tks,
> - Eric
>
> PS: I searched the archive but didn't find the answers.
>


Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
...and probably the additional hashing doesn't help the performance.
Eric


On 16/03/2011 19:17, Eric Charles wrote:
> A new laptop is definitively on my invest plan :)
> Tks,
> Eric
>
> On 16/03/2011 18:56, Harsh J wrote:
>> On Wed, Mar 16, 2011 at 8:36 PM, Eric Charles
>> <er...@u-mangate.com>  wrote:
>>> Cool.
>>> Everything is already available.
>> Great!
>>
>>> 1 row(s) in 0.0840 seconds
>>>> 1 row(s) in 0.0420 seconds
>> Interesting, how your test's get time is exactly the double of my 
>> test ;-)
>>
>


Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
A new laptop is definitively on my invest plan :)
Tks,
Eric

On 16/03/2011 18:56, Harsh J wrote:
> On Wed, Mar 16, 2011 at 8:36 PM, Eric Charles
> <er...@u-mangate.com>  wrote:
>> Cool.
>> Everything is already available.
> Great!
>
>> 1 row(s) in 0.0840 seconds
>>> 1 row(s) in 0.0420 seconds
> Interesting, how your test's get time is exactly the double of my test ;-)
>


Re: Hash keys

Posted by Harsh J <qw...@gmail.com>.
On Wed, Mar 16, 2011 at 8:36 PM, Eric Charles
<er...@u-mangate.com> wrote:
> Cool.
> Everything is already available.

Great!

> 1 row(s) in 0.0840 seconds
>> 1 row(s) in 0.0420 seconds

Interesting, how your test's get time is exactly the double of my test ;-)

-- 
Harsh J
http://harshj.com

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Cool.

Everything is already available.
I simply have to import MD5Hash and use the to_java_bytes ruby function.

hbase(main):001:0> import org.apache.hadoop.hbase.util.MD5Hash
=> Java::OrgApacheHadoopHbaseUtil::MD5Hash
hbase(main):002:0> put 'test',  
MD5Hash.getMD5AsHex('row1'.to_java_bytes), 'cf:a', 'value1'
0 row(s) in 0.5880 seconds
hbase(main):004:0> get 'test', 'row1'
COLUMN                                           CELL
0 row(s) in 0.0170 seconds
hbase(main):003:0> get 'test', MD5Hash.getMD5AsHex('row1'.to_java_bytes)
COLUMN                                           CELL
  cf:a                                            
timestamp=1300287899911, value=value1
1 row(s) in 0.0840 seconds

Many tks,

Eric


On 16/03/2011 15:44, Harsh J wrote:
> Using Java classes itself is possible from within HBase shell (since
> it is JRuby), but yes some Ruby knowledge should be helpful too!
>
> For instance, I can use java.lang.String by simply importing it:
>
> hbase(main):004:0>  import java.lang.String
> =>  Java::JavaLang::String
> hbase(main):004:0>  get String.new('test'), String.new('row1')
> COLUMN                                 CELL
>
>   f:a                                   timestamp=1300170063837, value=val4
>
> 1 row(s) in 0.0420 seconds
>
> On Wed, Mar 16, 2011 at 4:26 PM, Eric Charles
> <er...@u-mangate.com>  wrote:
>> Hi,
>> I understand from your answer that it's possible but not available.
>> Did anyone already implemented such a functionality?
>> If not, where should I begin to look at (hirb.rb, any tutorial,... ?) - I
>> know nothing about jruby.
>> Tks,
>> - Eric
>>
>> On 16/03/2011 10:39, Harsh J wrote:
>>> (For 2) I think the hash function should work in the shell if it
>>> returns a string type (like what '' defines in-place).
>>>
>>> On Wed, Mar 16, 2011 at 2:22 PM, Eric Charles
>>> <er...@u-mangate.com>    wrote:
>>>> Hi,
>>>>
>>>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>>>
>>>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>>>> use
>>>> case, because the rows will be randomly distributed in different regions.
>>>> Will it cause some performance loose?
>>>> 2. Is it possible to query from hbase shell with something like "get
>>>> 't1',
>>>> @hash('r1')", to let the shell compute the hash for you from the readable
>>>> key.
>>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would
>>>> you
>>>> advice? what about SHA1?
>>>>
>>>> Tks,
>>>> - Eric
>>>>
>>>> PS: I searched the archive but didn't find the answers.
>>>>
>>>>
>>>
>>
>
>


Re: Hash keys

Posted by Harsh J <qw...@gmail.com>.
Using Java classes itself is possible from within HBase shell (since
it is JRuby), but yes some Ruby knowledge should be helpful too!

For instance, I can use java.lang.String by simply importing it:

hbase(main):004:0> import java.lang.String
=> Java::JavaLang::String
hbase(main):004:0> get String.new('test'), String.new('row1')
COLUMN                                 CELL

 f:a                                   timestamp=1300170063837, value=val4

1 row(s) in 0.0420 seconds

On Wed, Mar 16, 2011 at 4:26 PM, Eric Charles
<er...@u-mangate.com> wrote:
> Hi,
> I understand from your answer that it's possible but not available.
> Did anyone already implemented such a functionality?
> If not, where should I begin to look at (hirb.rb, any tutorial,... ?) - I
> know nothing about jruby.
> Tks,
> - Eric
>
> On 16/03/2011 10:39, Harsh J wrote:
>>
>> (For 2) I think the hash function should work in the shell if it
>> returns a string type (like what '' defines in-place).
>>
>> On Wed, Mar 16, 2011 at 2:22 PM, Eric Charles
>> <er...@u-mangate.com>  wrote:
>>>
>>> Hi,
>>>
>>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>>
>>> 1. I wonder if this strategy is adviced for range queries (from/to key)
>>> use
>>> case, because the rows will be randomly distributed in different regions.
>>> Will it cause some performance loose?
>>> 2. Is it possible to query from hbase shell with something like "get
>>> 't1',
>>> @hash('r1')", to let the shell compute the hash for you from the readable
>>> key.
>>> 3. There are MD5 and Jenkins classes in hbase.util package. What would
>>> you
>>> advice? what about SHA1?
>>>
>>> Tks,
>>> - Eric
>>>
>>> PS: I searched the archive but didn't find the answers.
>>>
>>>
>>
>>
>
>



-- 
Harsh J
http://harshj.com

Re: Hash keys

Posted by Eric Charles <er...@u-mangate.com>.
Hi,
I understand from your answer that it's possible but not available.
Did anyone already implemented such a functionality?
If not, where should I begin to look at (hirb.rb, any tutorial,... ?) - 
I know nothing about jruby.
Tks,
- Eric

On 16/03/2011 10:39, Harsh J wrote:
> (For 2) I think the hash function should work in the shell if it
> returns a string type (like what '' defines in-place).
>
> On Wed, Mar 16, 2011 at 2:22 PM, Eric Charles
> <er...@u-mangate.com>  wrote:
>> Hi,
>>
>> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>>
>> 1. I wonder if this strategy is adviced for range queries (from/to key) use
>> case, because the rows will be randomly distributed in different regions.
>> Will it cause some performance loose?
>> 2. Is it possible to query from hbase shell with something like "get 't1',
>> @hash('r1')", to let the shell compute the hash for you from the readable
>> key.
>> 3. There are MD5 and Jenkins classes in hbase.util package. What would you
>> advice? what about SHA1?
>>
>> Tks,
>> - Eric
>>
>> PS: I searched the archive but didn't find the answers.
>>
>>
>
>


Re: Hash keys

Posted by Harsh J <qw...@gmail.com>.
(For 2) I think the hash function should work in the shell if it
returns a string type (like what '' defines in-place).

On Wed, Mar 16, 2011 at 2:22 PM, Eric Charles
<er...@u-mangate.com> wrote:
> Hi,
>
> To help avoid hotspots, I'm planning to use hashed keys in some tables.
>
> 1. I wonder if this strategy is adviced for range queries (from/to key) use
> case, because the rows will be randomly distributed in different regions.
> Will it cause some performance loose?
> 2. Is it possible to query from hbase shell with something like "get 't1',
> @hash('r1')", to let the shell compute the hash for you from the readable
> key.
> 3. There are MD5 and Jenkins classes in hbase.util package. What would you
> advice? what about SHA1?
>
> Tks,
> - Eric
>
> PS: I searched the archive but didn't find the answers.
>
>



-- 
Harsh J
http://harshj.com