You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Mingtao Zhang <ma...@gmail.com> on 2014/07/21 16:18:43 UTC

Rowkey, Consistant Hashing, MD5?

Hi,

I am trying to find a consistant hasing algorithm for the first portion of
the row key.

I saw the document/book that MD5 is mentioned everything.

But I have trouble to persuade myself that MD5 (
http://en.wikipedia.org/wiki/MD5) is considered as consistant hasing.

Could any of you point me to the library contains the hashing you are using?

Thanks in advance!

Best Regards,
Mingtao

Re: Rowkey, Consistant Hashing, MD5?

Posted by Jonathan Hsieh <jo...@cloudera.com>.
Just to be clear, are you sure you need consistent hashing for your row
key? These kinds of hashes are generally used for systems where load
balancing is done via distributed hashtables.  HBase doesn't need this
internally since it does range partitioning (and thus load balancing
doesn't need a hash).

You could implement you own consistent has built upon md5.  I hacked
together a version of this that used md5 a long time ago here [1], [2].
 YMMV.

[1]
https://github.com/jmhsieh/algorithms/tree/master/src/main/java/org/jmhsieh/sets/consistant
[2]
https://github.com/jmhsieh/algorithms/tree/master/src/test/java/org/jmhsieh/sets/consistant

Jon.


On Mon, Jul 21, 2014 at 7:18 AM, Mingtao Zhang <ma...@gmail.com>
wrote:

> Hi,
>
> I am trying to find a consistant hasing algorithm for the first portion of
> the row key.
>
> I saw the document/book that MD5 is mentioned everything.
>
> But I have trouble to persuade myself that MD5 (
> http://en.wikipedia.org/wiki/MD5) is considered as consistant hasing.
>
> Could any of you point me to the library contains the hashing you are
> using?
>
> Thanks in advance!
>
> Best Regards,
> Mingtao
>



-- 
// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

Re: Rowkey, Consistant Hashing, MD5?

Posted by Mingtao Zhang <ma...@gmail.com>.
Thank you all. Moved to Murmur hash.

Best Regards,
Mingtao


On Mon, Jul 21, 2014 at 10:58 PM, Ishan Chhabra <ic...@rocketfuel.com>
wrote:

> No *guarantees* on collision, but yes, it is a deterministic mapping and
> you won't see collisions in that range (provided you choose enough bits).
>
> See MurmurHash here: http://en.wikipedia.org/wiki/MurmurHash
>
> and to understand collision probabilities, read this:
> http://en.wikipedia.org/wiki/Birthday_problem
>
>
> On Mon, Jul 21, 2014 at 7:55 PM, Mingtao Zhang <ma...@gmail.com>
> wrote:
>
> > Thank you all!
> >
> > Sorry, I think 'consistent hashing' is wrong word.
> >
> > For my use case, I need to store this 'prefix' (either hashed/not) into
> > another table.
> >
> > Will this murmur hashing guarantee next time same string will map to same
> > bytes? And no collision for around 2^10 records?
> >
> > Mingtao Sent from iPhone
> >
> > > On Jul 21, 2014, at 10:28 PM, Ishan Chhabra <ic...@rocketfuel.com>
> > wrote:
> > >
> > > Mingtao,
> > > If I understand correctly, you want to prefix the key with a hash (as
> > > mentioned in the book) to get a good distribution. Use MurmurHash
> (there
> > is
> > > an implementation in HBase code itself) as it is fast and gives a
> uniform
> > > distribution.
> > >
> > > "Consistent Hashing" is not the correct term to use here if I
> understand
> > > your intent correctly.
> > >
> > >
> > >> On Mon, Jul 21, 2014 at 2:44 PM, Liam Slusser <ls...@gmail.com>
> > wrote:
> > >>
> > >> MD5 isn't a consistent hashing algorithm.  Consistent hashing is a
> > scheme
> > >> that provides a hash table functionality in a way that the adding or
> > >> removing of one slot does not significantly change the mapping of keys
> > to
> > >> slots.  With that said, a lot of consistent hashing algorithms USE
> > >> md5...but it alone won't get you all the way there.
> > >>
> > >> Some light bedtime reading:
> > >> http://en.wikipedia.org/wiki/Consistent_hashing
> > >>
> > >> liam
> > >>
> > >>
> > >> On Mon, Jul 21, 2014 at 7:18 AM, Mingtao Zhang <
> mail2mingtao@gmail.com>
> > >> wrote:
> > >>
> > >>> Hi,
> > >>>
> > >>> I am trying to find a consistant hasing algorithm for the first
> portion
> > >> of
> > >>> the row key.
> > >>>
> > >>> I saw the document/book that MD5 is mentioned everything.
> > >>>
> > >>> But I have trouble to persuade myself that MD5 (
> > >>> http://en.wikipedia.org/wiki/MD5) is considered as consistant
> hasing.
> > >>>
> > >>> Could any of you point me to the library contains the hashing you are
> > >>> using?
> > >>>
> > >>> Thanks in advance!
> > >>>
> > >>> Best Regards,
> > >>> Mingtao
> > >
> > >
> > >
> > > --
> > > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
> >
>
>
>
> --
> *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
>

Re: Rowkey, Consistant Hashing, MD5?

Posted by Ishan Chhabra <ic...@rocketfuel.com>.
No *guarantees* on collision, but yes, it is a deterministic mapping and
you won't see collisions in that range (provided you choose enough bits).

See MurmurHash here: http://en.wikipedia.org/wiki/MurmurHash

and to understand collision probabilities, read this:
http://en.wikipedia.org/wiki/Birthday_problem


On Mon, Jul 21, 2014 at 7:55 PM, Mingtao Zhang <ma...@gmail.com>
wrote:

> Thank you all!
>
> Sorry, I think 'consistent hashing' is wrong word.
>
> For my use case, I need to store this 'prefix' (either hashed/not) into
> another table.
>
> Will this murmur hashing guarantee next time same string will map to same
> bytes? And no collision for around 2^10 records?
>
> Mingtao Sent from iPhone
>
> > On Jul 21, 2014, at 10:28 PM, Ishan Chhabra <ic...@rocketfuel.com>
> wrote:
> >
> > Mingtao,
> > If I understand correctly, you want to prefix the key with a hash (as
> > mentioned in the book) to get a good distribution. Use MurmurHash (there
> is
> > an implementation in HBase code itself) as it is fast and gives a uniform
> > distribution.
> >
> > "Consistent Hashing" is not the correct term to use here if I understand
> > your intent correctly.
> >
> >
> >> On Mon, Jul 21, 2014 at 2:44 PM, Liam Slusser <ls...@gmail.com>
> wrote:
> >>
> >> MD5 isn't a consistent hashing algorithm.  Consistent hashing is a
> scheme
> >> that provides a hash table functionality in a way that the adding or
> >> removing of one slot does not significantly change the mapping of keys
> to
> >> slots.  With that said, a lot of consistent hashing algorithms USE
> >> md5...but it alone won't get you all the way there.
> >>
> >> Some light bedtime reading:
> >> http://en.wikipedia.org/wiki/Consistent_hashing
> >>
> >> liam
> >>
> >>
> >> On Mon, Jul 21, 2014 at 7:18 AM, Mingtao Zhang <ma...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> I am trying to find a consistant hasing algorithm for the first portion
> >> of
> >>> the row key.
> >>>
> >>> I saw the document/book that MD5 is mentioned everything.
> >>>
> >>> But I have trouble to persuade myself that MD5 (
> >>> http://en.wikipedia.org/wiki/MD5) is considered as consistant hasing.
> >>>
> >>> Could any of you point me to the library contains the hashing you are
> >>> using?
> >>>
> >>> Thanks in advance!
> >>>
> >>> Best Regards,
> >>> Mingtao
> >
> >
> >
> > --
> > *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.
>



-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Re: Rowkey, Consistant Hashing, MD5?

Posted by Mingtao Zhang <ma...@gmail.com>.
Thank you all!

Sorry, I think 'consistent hashing' is wrong word.

For my use case, I need to store this 'prefix' (either hashed/not) into another table.

Will this murmur hashing guarantee next time same string will map to same bytes? And no collision for around 2^10 records?

Mingtao Sent from iPhone

> On Jul 21, 2014, at 10:28 PM, Ishan Chhabra <ic...@rocketfuel.com> wrote:
> 
> Mingtao,
> If I understand correctly, you want to prefix the key with a hash (as
> mentioned in the book) to get a good distribution. Use MurmurHash (there is
> an implementation in HBase code itself) as it is fast and gives a uniform
> distribution.
> 
> "Consistent Hashing" is not the correct term to use here if I understand
> your intent correctly.
> 
> 
>> On Mon, Jul 21, 2014 at 2:44 PM, Liam Slusser <ls...@gmail.com> wrote:
>> 
>> MD5 isn't a consistent hashing algorithm.  Consistent hashing is a scheme
>> that provides a hash table functionality in a way that the adding or
>> removing of one slot does not significantly change the mapping of keys to
>> slots.  With that said, a lot of consistent hashing algorithms USE
>> md5...but it alone won't get you all the way there.
>> 
>> Some light bedtime reading:
>> http://en.wikipedia.org/wiki/Consistent_hashing
>> 
>> liam
>> 
>> 
>> On Mon, Jul 21, 2014 at 7:18 AM, Mingtao Zhang <ma...@gmail.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I am trying to find a consistant hasing algorithm for the first portion
>> of
>>> the row key.
>>> 
>>> I saw the document/book that MD5 is mentioned everything.
>>> 
>>> But I have trouble to persuade myself that MD5 (
>>> http://en.wikipedia.org/wiki/MD5) is considered as consistant hasing.
>>> 
>>> Could any of you point me to the library contains the hashing you are
>>> using?
>>> 
>>> Thanks in advance!
>>> 
>>> Best Regards,
>>> Mingtao
> 
> 
> 
> -- 
> *Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Re: Rowkey, Consistant Hashing, MD5?

Posted by Ishan Chhabra <ic...@rocketfuel.com>.
Mingtao,
If I understand correctly, you want to prefix the key with a hash (as
mentioned in the book) to get a good distribution. Use MurmurHash (there is
an implementation in HBase code itself) as it is fast and gives a uniform
distribution.

"Consistent Hashing" is not the correct term to use here if I understand
your intent correctly.


On Mon, Jul 21, 2014 at 2:44 PM, Liam Slusser <ls...@gmail.com> wrote:

> MD5 isn't a consistent hashing algorithm.  Consistent hashing is a scheme
> that provides a hash table functionality in a way that the adding or
> removing of one slot does not significantly change the mapping of keys to
> slots.  With that said, a lot of consistent hashing algorithms USE
> md5...but it alone won't get you all the way there.
>
> Some light bedtime reading:
> http://en.wikipedia.org/wiki/Consistent_hashing
>
> liam
>
>
> On Mon, Jul 21, 2014 at 7:18 AM, Mingtao Zhang <ma...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am trying to find a consistant hasing algorithm for the first portion
> of
> > the row key.
> >
> > I saw the document/book that MD5 is mentioned everything.
> >
> > But I have trouble to persuade myself that MD5 (
> > http://en.wikipedia.org/wiki/MD5) is considered as consistant hasing.
> >
> > Could any of you point me to the library contains the hashing you are
> > using?
> >
> > Thanks in advance!
> >
> > Best Regards,
> > Mingtao
> >
>



-- 
*Ishan Chhabra *| Rocket Scientist | RocketFuel Inc.

Re: Rowkey, Consistant Hashing, MD5?

Posted by Liam Slusser <ls...@gmail.com>.
MD5 isn't a consistent hashing algorithm.  Consistent hashing is a scheme
that provides a hash table functionality in a way that the adding or
removing of one slot does not significantly change the mapping of keys to
slots.  With that said, a lot of consistent hashing algorithms USE
md5...but it alone won't get you all the way there.

Some light bedtime reading: http://en.wikipedia.org/wiki/Consistent_hashing

liam


On Mon, Jul 21, 2014 at 7:18 AM, Mingtao Zhang <ma...@gmail.com>
wrote:

> Hi,
>
> I am trying to find a consistant hasing algorithm for the first portion of
> the row key.
>
> I saw the document/book that MD5 is mentioned everything.
>
> But I have trouble to persuade myself that MD5 (
> http://en.wikipedia.org/wiki/MD5) is considered as consistant hasing.
>
> Could any of you point me to the library contains the hashing you are
> using?
>
> Thanks in advance!
>
> Best Regards,
> Mingtao
>