You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2009/04/01 23:31:58 UTC

Re: OPHF

Avinash already commited his new order-preserving hash function and I
missed it.  It's in OrderPreservingHashPartitioner.  It takes the
approach that Todd and I discussed back in January: turn the string
into a base-Char.MAX_VALUE number.
(http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed).
 I chatted with Avinash a little on IM but we didn't finish, so I'm
picking it up here.

There are two problems with this approach.  One is that the hashes
will only be order-preserving for a subset of unicode (all of UCS-2,
but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
 The other is that this only gives you a naive ordering by code point
value, which for unicode is not what you want and even for ascii
sometimes you want another collation.  (see
http://www.unicode.org/reports/tr10/ and
http://java.sun.com/javase/6/docs/api/java/text/Collator.html).

Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
you are going to do range queries on them.  The correct
collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
char value gives ['--a', '--b', 'a', 'b'].

Switching to a more flexibile system like the one I wrote for
CASSANDRA-3 will let use use Token<BigInteger> for random distribution
or Token<String> for order-preserving, with user-defined collation.  I
don't see a way to get this kind of flexibility from an approach that
insists on turning everything into BigInteger.

-Jonathan

On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> Avinash,
>
> You mentioned that you have a new order-preserving hash function that
> you think will be more generally useful.  Can you post it?
>
> thanks,
>
> -Jonathan
>

Re: OPHF

Posted by Zhu Han <sc...@gmail.com>.

I see. Thanks!

best regards,
hanzhu


On Sun, Apr 5, 2009 at 12:24 PM, Jonathan ellis <jb...@gmail.com> wrote:

> Note that Avinash's ophf is 1-1 up to the max key length.  But this has its
> own problems.  (Mainly, that keys cannot be arbitrarily long, and hash
> computation gets more expensive proportional to max key length.)
>
> -Jonathan
>
>
> On Apr 4, 2009, at 10:32 PM, Zhu Han <sc...@gmail.com> wrote:
>
>  Avinash,
>>
>> (For cassandra, the "key" mentioned in following words is the name of the
>> row. The "token" mentioned in following words, is the output of the hash
>> function. )
>>
>> For example, if you use 160bit md5 hash value as the space, it's finite
>> theoretically, because you know the number of tokens which can be
>> contained
>> ahead of time, although the number is pretty large. For finite hash space,
>> the mapping from key to token is many to one instead of one to one.
>>
>> For OHPF, a lot of keys could be mapped to  the same token range because
>> of
>> the non-uniform distribution just as you pointed out. The load balancing
>> can
>> solve the problem. However, if too may keys mapped to *the same token*
>> which is determined by the hash function, how can the load balance be
>> carried out? When the hash space is finite, it's possible that a large
>> number of keys are mapped to the same token. However, if the hash function
>> is uniformly distributed, just as MD5 or SHA1, the probability of the
>> worse
>> case is pretty rare, at least this case  cannot be produced by human
>> purposely,  so that's not a problem. But for OPHF, if the clients pick up
>> the name of key purposely, e.g. for DOS attack, the worst case can occur
>> practically.  When the attacker knows the algorithm of the OPHF, he can
>> produce arbitrary number of keys mapped to the same token.
>>
>> If the hash space is infinite, or it's as large as the possible space of
>> keys, this would not be a problem, because the key to token mapping is one
>> to one instead of many to one .
>>
>> best regards,
>> hanzhu
>>
>>
>> On Sat, Apr 4, 2009 at 11:59 PM, Avinash Lakshman <
>> avinash.lakshman@gmail.com> wrote:
>>
>>  Not sure what you mean by infinite. OPHF or not any hash function's range
>>> is
>>> practically infinite for MD5 is infinite hash range. With OPHF you will
>>> have
>>> a lot of skew. There are only two ways to deal with it:
>>> (1) You know the space of of all your keys ahead of time and have your
>>> system exploit that.
>>> (2) Dynamically load balance until your system reaches steady state.
>>>
>>> We are always sorting lexicographically and trying to do both of the
>>> above.
>>>
>>> Avinash
>>>
>>> On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <sc...@gmail.com> wrote:
>>>
>>>  Avinash,
>>>>
>>>> I support the proposal of Jonathan.
>>>>
>>>> Theoretically,  if the system wants to support range query over
>>>>
>>> consistent
>>>
>>>> hash(DHT),  the hash space should be infinite. Otherwise,  a large
>>>> number
>>>> of
>>>> keys could be mapped to the same token and  it's impossible to do load
>>>> balance by moving these keys around to other physical servers.
>>>>
>>> Especially,
>>>
>>>> the churn of OPHF could be very high because the user would like to
>>>>
>>> create
>>>
>>>> a
>>>> lot of keys with the same long prefix for some applications. If the
>>>>
>>> system
>>>
>>>> takes the string as the token and sorts them in *lexicographic order,
>>>> the
>>>> hash space would be infinite, so that the churn can be solved by load
>>>> balacing under any circumstances.
>>>>
>>>> *best regards,
>>>> hanzhu
>>>>
>>>>
>>>> On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
>>>> avinash.lakshman@gmail.com> wrote:
>>>>
>>>>  So how are you coming up with the tokens here? What do you mean by
>>>>> string[0]
>>>>> and string[last]? Are they the keys that belong to the system? If so
>>>>>
>>>> then
>>>
>>>> that is not a mechanism to reason about the range of your hash
>>>>>
>>>> function.
>>>
>>>> In
>>>>
>>>>> this scheme does it mean you will need to sort the tokens too using
>>>>> collation scheme used by the external partitioner while identifying
>>>>>
>>>> which
>>>
>>>> key goes to which node. Also could you provide me with the patch
>>>>>
>>>> number.
>>>
>>>> I
>>>>
>>>>> need to go over that this weekend and make sure if it does not affect
>>>>>
>>>> the
>>>
>>>> (bootstrap/load balance) logic. My fear is that these changes have long
>>>>> reaching effects and I need to make sure that all these pieces will
>>>>> continue
>>>>> to reside in harmony. Also your range query test did it run in a
>>>>> distributed
>>>>> environment or in a single box environment?
>>>>> Avinash
>>>>>
>>>>> On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  So with a String ring you just sort the strings (conceptually) and
>>>>>>
>>>>> you
>>>
>>>> wrap around from strings[last] to strings[0].  The range is entirely
>>>>>> defined by what tokens you have.
>>>>>>
>>>>>> It's nice to be able to test against a real application.  Mine won't
>>>>>> run without range queries...  (So it is still running against my
>>>>>> github code where I first wrote range queries, but that uses the OPHF
>>>>>> approach you have and so that is where I found the problems.)
>>>>>>
>>>>>> -Jonathan
>>>>>>
>>>>>> On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
>>>>>> <av...@gmail.com> wrote:
>>>>>>
>>>>>>> I will look into this. Another point about P2P rings is something
>>>>>>>
>>>>>> about
>>>>
>>>>> the
>>>>>>
>>>>>>> ability to reason about the system. In any hash fn we hve a notion
>>>>>>>
>>>>>> of
>>>
>>>> range
>>>>>>
>>>>>>> and how the ring wraps around. For eg. in random hash such as MD5
>>>>>>>
>>>>>> we
>>>
>>>> say
>>>>>
>>>>>> 0-2^128 -1 and then the wrap around. Similarly for the hash fn used
>>>>>>>
>>>>>> here
>>>>>
>>>>>> one
>>>>>>
>>>>>>> could define the same. Today we perform usual sort of strings and
>>>>>>>
>>>>>> the
>>>
>>>> hash
>>>>>>
>>>>>>> should respect that sort order. I guess you are saying the usual
>>>>>>>
>>>>>> sort
>>>
>>>> of
>>>>>
>>>>>> string does not respect collation. I will look into this and I
>>>>>>>
>>>>>> insist
>>>
>>>> this
>>>>>>
>>>>>>> should be very doable. Changes to the implementation of the core
>>>>>>>
>>>>>> classes
>>>>>
>>>>>> albeit how simple they are very very scary unless they are
>>>>>>>
>>>>>> completely
>>>
>>>> tested
>>>>>>
>>>>>>> too in a distributed setting. Over here we do not have detailed
>>>>>>>
>>>>>> test
>>>
>>>> code,
>>>>>>
>>>>>>> but we test by directing a % of the site traffic to a test cluster
>>>>>>>
>>>>>> before
>>>>>
>>>>>> we
>>>>>>
>>>>>>> sign off on anything.
>>>>>>>
>>>>>>> Avinash
>>>>>>>
>>>>>>> On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
>>>>>>>
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>  Say for instance you have inserted keys ['a', 'b', '--a', '--b']
>>>>>>>>
>>>>>>> and
>>>
>>>> you are going to do range queries on them.  The correct
>>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>>>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>>
>>>>>>>> Attached is a test program illustrating this.  The assert will
>>>>>>>>
>>>>>>> fail
>>>
>>>> because the hash ordering is not the same as the collator's.
>>>>>>>>
>>>>>>>> -Jonathan
>>>>>>>>
>>>>>>>> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
>>>>>>>> <av...@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> In fact what would you want is hash to respect oorder based on
>>>>>>>>>
>>>>>>>> how
>>>
>>>> the
>>>>>
>>>>>> strings are sorted. For the example you have it does. So am I
>>>>>>>>>
>>>>>>>> missing
>>>>>
>>>>>> something?
>>>>>>>>>
>>>>>>>>> Avinash
>>>>>>>>>
>>>>>>>>> On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <
>>>>>>>>>
>>>>>>>> jbellis@gmail.com
>>>
>>>>
>>>>>  wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  For that aspect no difference between a String ring based on
>>>>>>>>>>
>>>>>>>>> compareTo
>>>>>>
>>>>>>> and a BigInteger one.  The only difference (and it is an
>>>>>>>>>>
>>>>>>>>> important
>>>>
>>>>> one
>>>>>>
>>>>>>> for the reasons I gave!) is how the compare works.  But for the
>>>>>>>>>>
>>>>>>>>> p2p
>>>>
>>>>> aspect it does not matter.
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>>>>>>>>>> <av...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> Doing what you are suggesting scares the hell out of me for a
>>>>>>>>>>>
>>>>>>>>>> couple
>>>>>>
>>>>>>> of
>>>>>>>>
>>>>>>>>> reasons -  All work in P2P be it random/OPHF does the token
>>>>>>>>>>>
>>>>>>>>>> handling
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> way
>>>>>>>>>>
>>>>>>>>>>> it is setup. I cannot try something that has not been well
>>>>>>>>>>>
>>>>>>>>>> explored
>>>>>
>>>>>> in
>>>>>>
>>>>>>> academia. I insist this must be doable. I am going to think
>>>>>>>>>>>
>>>>>>>>>> about
>>>>
>>>>> this
>>>>>>
>>>>>>> more.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Avinash
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
>>>>>>>>>>>
>>>>>>>>>> jbellis@gmail.com>
>>>>>
>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>  Avinash already commited his new order-preserving hash
>>>>>>>>>>>>
>>>>>>>>>>> function
>>>>
>>>>> and I
>>>>>>
>>>>>>> missed it.  It's in OrderPreservingHashPartitioner.  It
>>>>>>>>>>>>
>>>>>>>>>>> takes
>>>
>>>> the
>>>>>
>>>>>> approach that Todd and I discussed back in January: turn the
>>>>>>>>>>>>
>>>>>>>>>>> string
>>>>>>
>>>>>>> into a base-Char.MAX_VALUE number.
>>>>>>>>>>>> (
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>>>
>>>> ).
>>>>>>>>>>>> I chatted with Avinash a little on IM but we didn't finish,
>>>>>>>>>>>>
>>>>>>>>>>> so
>>>>
>>>>> I'm
>>>>>>
>>>>>>> picking it up here.
>>>>>>>>>>>>
>>>>>>>>>>>> There are two problems with this approach.  One is that the
>>>>>>>>>>>>
>>>>>>>>>>> hashes
>>>>>
>>>>>> will only be order-preserving for a subset of unicode (all
>>>>>>>>>>>>
>>>>>>>>>>> of
>>>
>>>> UCS-2,
>>>>>>
>>>>>>> but not all of UTF-16; see
>>>>>>>>>>>>
>>>>>>>>>>> http://en.wikipedia.org/wiki/UTF-16/UCS-2
>>>>>>
>>>>>>> ).
>>>>>>>>
>>>>>>>>> The other is that this only gives you a naive ordering by
>>>>>>>>>>>>
>>>>>>>>>>> code
>>>>
>>>>> point
>>>>>>
>>>>>>> value, which for unicode is not what you want and even for
>>>>>>>>>>>>
>>>>>>>>>>> ascii
>>>>
>>>>> sometimes you want another collation.  (see
>>>>>>>>>>>> http://www.unicode.org/reports/tr10/ and
>>>>>>>>>>>>
>>>>>>>>>>>>  http://java.sun.com/javase/6/docs/api/java/text/Collator.html
>>>
>>>> ).
>>>>
>>>>>
>>>>>>>>>>>> Say for instance you have inserted keys ['a', 'b', '--a',
>>>>>>>>>>>>
>>>>>>>>>>> '--b']
>>>>
>>>>> and
>>>>>>
>>>>>>> you are going to do range queries on them.  The correct
>>>>>>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But
>>>>>>>>>>>>
>>>>>>>>>>> ordering
>>>>
>>>>> by
>>>>>
>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>>>>>>
>>>>>>>>>>>> Switching to a more flexibile system like the one I wrote
>>>>>>>>>>>>
>>>>>>>>>>> for
>>>
>>>> CASSANDRA-3 will let use use Token<BigInteger> for random
>>>>>>>>>>>>
>>>>>>>>>>> distribution
>>>>>>>>
>>>>>>>>> or Token<String> for order-preserving, with user-defined
>>>>>>>>>>>>
>>>>>>>>>>> collation.
>>>>>>
>>>>>>> I
>>>>>>>>
>>>>>>>>> don't see a way to get this kind of flexibility from an
>>>>>>>>>>>>
>>>>>>>>>>> approach
>>>>
>>>>> that
>>>>>>
>>>>>>> insists on turning everything into BigInteger.
>>>>>>>>>>>>
>>>>>>>>>>>> -Jonathan
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
>>>>>>>>>>>>
>>>>>>>>>>> jbellis@gmail.com>
>>>>>>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Avinash,
>>>>>>>>>>>>>
>>>>>>>>>>>>> You mentioned that you have a new order-preserving hash
>>>>>>>>>>>>>
>>>>>>>>>>>> function
>>>>>
>>>>>> that
>>>>>>>>
>>>>>>>>> you think will be more generally useful.  Can you post it?
>>>>>>>>>>>>>
>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Jonathan
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>

Re: OPHF

Posted by Jonathan ellis <jb...@gmail.com>.

Note that Avinash's ophf is 1-1 up to the max key length.  But this  
has its own problems.  (Mainly, that keys cannot be arbitrarily long,  
and hash computation gets more expensive proportional to max key  
length.)

-Jonathan

On Apr 4, 2009, at 10:32 PM, Zhu Han <sc...@gmail.com> wrote:

> Avinash,
>
> (For cassandra, the "key" mentioned in following words is the name  
> of the
> row. The "token" mentioned in following words, is the output of the  
> hash
> function. )
>
> For example, if you use 160bit md5 hash value as the space, it's  
> finite
> theoretically, because you know the number of tokens which can be  
> contained
> ahead of time, although the number is pretty large. For finite hash  
> space,
> the mapping from key to token is many to one instead of one to one.
>
> For OHPF, a lot of keys could be mapped to  the same token range  
> because of
> the non-uniform distribution just as you pointed out. The load  
> balancing can
> solve the problem. However, if too may keys mapped to *the same token*
> which is determined by the hash function, how can the load balance be
> carried out? When the hash space is finite, it's possible that a large
> number of keys are mapped to the same token. However, if the hash  
> function
> is uniformly distributed, just as MD5 or SHA1, the probability of  
> the worse
> case is pretty rare, at least this case  cannot be produced by human
> purposely,  so that's not a problem. But for OPHF, if the clients  
> pick up
> the name of key purposely, e.g. for DOS attack, the worst case can  
> occur
> practically.  When the attacker knows the algorithm of the OPHF, he  
> can
> produce arbitrary number of keys mapped to the same token.
>
> If the hash space is infinite, or it's as large as the possible  
> space of
> keys, this would not be a problem, because the key to token mapping  
> is one
> to one instead of many to one .
>
> best regards,
> hanzhu
>
>
> On Sat, Apr 4, 2009 at 11:59 PM, Avinash Lakshman <
> avinash.lakshman@gmail.com> wrote:
>
>> Not sure what you mean by infinite. OPHF or not any hash function's  
>> range
>> is
>> practically infinite for MD5 is infinite hash range. With OPHF you  
>> will
>> have
>> a lot of skew. There are only two ways to deal with it:
>> (1) You know the space of of all your keys ahead of time and have  
>> your
>> system exploit that.
>> (2) Dynamically load balance until your system reaches steady state.
>>
>> We are always sorting lexicographically and trying to do both of  
>> the above.
>>
>> Avinash
>>
>> On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <sc...@gmail.com>  
>> wrote:
>>
>>> Avinash,
>>>
>>> I support the proposal of Jonathan.
>>>
>>> Theoretically,  if the system wants to support range query over
>> consistent
>>> hash(DHT),  the hash space should be infinite. Otherwise,  a large  
>>> number
>>> of
>>> keys could be mapped to the same token and  it's impossible to do  
>>> load
>>> balance by moving these keys around to other physical servers.
>> Especially,
>>> the churn of OPHF could be very high because the user would like to
>> create
>>> a
>>> lot of keys with the same long prefix for some applications. If the
>> system
>>> takes the string as the token and sorts them in *lexicographic  
>>> order, the
>>> hash space would be infinite, so that the churn can be solved by  
>>> load
>>> balacing under any circumstances.
>>>
>>> *best regards,
>>> hanzhu
>>>
>>>
>>> On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
>>> avinash.lakshman@gmail.com> wrote:
>>>
>>>> So how are you coming up with the tokens here? What do you mean by
>>>> string[0]
>>>> and string[last]? Are they the keys that belong to the system? If  
>>>> so
>> then
>>>> that is not a mechanism to reason about the range of your hash
>> function.
>>> In
>>>> this scheme does it mean you will need to sort the tokens too using
>>>> collation scheme used by the external partitioner while identifying
>> which
>>>> key goes to which node. Also could you provide me with the patch
>> number.
>>> I
>>>> need to go over that this weekend and make sure if it does not  
>>>> affect
>> the
>>>> (bootstrap/load balance) logic. My fear is that these changes  
>>>> have long
>>>> reaching effects and I need to make sure that all these pieces will
>>>> continue
>>>> to reside in harmony. Also your range query test did it run in a
>>>> distributed
>>>> environment or in a single box environment?
>>>> Avinash
>>>>
>>>> On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com>
>>> wrote:
>>>>
>>>>> So with a String ring you just sort the strings (conceptually) and
>> you
>>>>> wrap around from strings[last] to strings[0].  The range is  
>>>>> entirely
>>>>> defined by what tokens you have.
>>>>>
>>>>> It's nice to be able to test against a real application.  Mine  
>>>>> won't
>>>>> run without range queries...  (So it is still running against my
>>>>> github code where I first wrote range queries, but that uses the  
>>>>> OPHF
>>>>> approach you have and so that is where I found the problems.)
>>>>>
>>>>> -Jonathan
>>>>>
>>>>> On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
>>>>> <av...@gmail.com> wrote:
>>>>>> I will look into this. Another point about P2P rings is something
>>> about
>>>>> the
>>>>>> ability to reason about the system. In any hash fn we hve a  
>>>>>> notion
>> of
>>>>> range
>>>>>> and how the ring wraps around. For eg. in random hash such as MD5
>> we
>>>> say
>>>>>> 0-2^128 -1 and then the wrap around. Similarly for the hash fn  
>>>>>> used
>>>> here
>>>>> one
>>>>>> could define the same. Today we perform usual sort of strings and
>> the
>>>>> hash
>>>>>> should respect that sort order. I guess you are saying the usual
>> sort
>>>> of
>>>>>> string does not respect collation. I will look into this and I
>> insist
>>>>> this
>>>>>> should be very doable. Changes to the implementation of the core
>>>> classes
>>>>>> albeit how simple they are very very scary unless they are
>> completely
>>>>> tested
>>>>>> too in a distributed setting. Over here we do not have detailed
>> test
>>>>> code,
>>>>>> but we test by directing a % of the site traffic to a test  
>>>>>> cluster
>>>> before
>>>>> we
>>>>>> sign off on anything.
>>>>>>
>>>>>> Avinash
>>>>>>
>>>>>> On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis  
>>>>>> <jb...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>>> Say for instance you have inserted keys ['a', 'b', '--a', '--b']
>> and
>>>>>>> you are going to do range queries on them.  The correct
>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But  
>>>>>>> ordering by
>>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>
>>>>>>> Attached is a test program illustrating this.  The assert will
>> fail
>>>>>>> because the hash ordering is not the same as the collator's.
>>>>>>>
>>>>>>> -Jonathan
>>>>>>>
>>>>>>> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
>>>>>>> <av...@gmail.com> wrote:
>>>>>>>> In fact what would you want is hash to respect oorder based on
>> how
>>>> the
>>>>>>>> strings are sorted. For the example you have it does. So am I
>>>> missing
>>>>>>>> something?
>>>>>>>>
>>>>>>>> Avinash
>>>>>>>>
>>>>>>>> On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <
>> jbellis@gmail.com
>>>>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> For that aspect no difference between a String ring based on
>>>>> compareTo
>>>>>>>>> and a BigInteger one.  The only difference (and it is an
>>> important
>>>>> one
>>>>>>>>> for the reasons I gave!) is how the compare works.  But for  
>>>>>>>>> the
>>> p2p
>>>>>>>>> aspect it does not matter.
>>>>>>>>>
>>>>>>>>> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>>>>>>>>> <av...@gmail.com> wrote:
>>>>>>>>>> Doing what you are suggesting scares the hell out of me for a
>>>>> couple
>>>>>>> of
>>>>>>>>>> reasons -  All work in P2P be it random/OPHF does the token
>>>>> handling
>>>>>>> the
>>>>>>>>> way
>>>>>>>>>> it is setup. I cannot try something that has not been well
>>>> explored
>>>>> in
>>>>>>>>>> academia. I insist this must be doable. I am going to think
>>> about
>>>>> this
>>>>>>>>> more.
>>>>>>>>>>
>>>>>>>>>> Avinash
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
>>>> jbellis@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Avinash already commited his new order-preserving hash
>>> function
>>>>> and I
>>>>>>>>>>> missed it.  It's in OrderPreservingHashPartitioner.  It
>> takes
>>>> the
>>>>>>>>>>> approach that Todd and I discussed back in January: turn the
>>>>> string
>>>>>>>>>>> into a base-Char.MAX_VALUE number.
>>>>>>>>>>> (
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>>>>>>>>>>> ).
>>>>>>>>>>> I chatted with Avinash a little on IM but we didn't finish,
>>> so
>>>>> I'm
>>>>>>>>>>> picking it up here.
>>>>>>>>>>>
>>>>>>>>>>> There are two problems with this approach.  One is that the
>>>> hashes
>>>>>>>>>>> will only be order-preserving for a subset of unicode (all
>> of
>>>>> UCS-2,
>>>>>>>>>>> but not all of UTF-16; see
>>>>> http://en.wikipedia.org/wiki/UTF-16/UCS-2
>>>>>>> ).
>>>>>>>>>>> The other is that this only gives you a naive ordering by
>>> code
>>>>> point
>>>>>>>>>>> value, which for unicode is not what you want and even for
>>> ascii
>>>>>>>>>>> sometimes you want another collation.  (see
>>>>>>>>>>> http://www.unicode.org/reports/tr10/ and
>>>>>>>>>>>
>> http://java.sun.com/javase/6/docs/api/java/text/Collator.html
>>> ).
>>>>>>>>>>>
>>>>>>>>>>> Say for instance you have inserted keys ['a', 'b', '--a',
>>> '--b']
>>>>> and
>>>>>>>>>>> you are going to do range queries on them.  The correct
>>>>>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But
>>> ordering
>>>> by
>>>>>>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>>>>>
>>>>>>>>>>> Switching to a more flexibile system like the one I wrote
>> for
>>>>>>>>>>> CASSANDRA-3 will let use use Token<BigInteger> for random
>>>>>>> distribution
>>>>>>>>>>> or Token<String> for order-preserving, with user-defined
>>>>> collation.
>>>>>>> I
>>>>>>>>>>> don't see a way to get this kind of flexibility from an
>>> approach
>>>>> that
>>>>>>>>>>> insists on turning everything into BigInteger.
>>>>>>>>>>>
>>>>>>>>>>> -Jonathan
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
>>>>> jbellis@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>>>> Avinash,
>>>>>>>>>>>>
>>>>>>>>>>>> You mentioned that you have a new order-preserving hash
>>>> function
>>>>>>> that
>>>>>>>>>>>> you think will be more generally useful.  Can you post it?
>>>>>>>>>>>>
>>>>>>>>>>>> thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> -Jonathan
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: OPHF

Posted by Zhu Han <sc...@gmail.com>.

Avinash,

(For cassandra, the "key" mentioned in following words is the name of the
row. The "token" mentioned in following words, is the output of the hash
function. )

For example, if you use 160bit md5 hash value as the space, it's finite
theoretically, because you know the number of tokens which can be contained
ahead of time, although the number is pretty large. For finite hash space,
the mapping from key to token is many to one instead of one to one.

For OHPF, a lot of keys could be mapped to  the same token range because of
the non-uniform distribution just as you pointed out. The load balancing can
solve the problem. However, if too may keys mapped to *the same token*
which is determined by the hash function, how can the load balance be
carried out? When the hash space is finite, it's possible that a large
number of keys are mapped to the same token. However, if the hash function
is uniformly distributed, just as MD5 or SHA1, the probability of the worse
case is pretty rare, at least this case  cannot be produced by human
purposely,  so that's not a problem. But for OPHF, if the clients pick up
the name of key purposely, e.g. for DOS attack, the worst case can occur
practically.  When the attacker knows the algorithm of the OPHF, he can
produce arbitrary number of keys mapped to the same token.

If the hash space is infinite, or it's as large as the possible space of
keys, this would not be a problem, because the key to token mapping is one
to one instead of many to one .

best regards,
hanzhu


On Sat, Apr 4, 2009 at 11:59 PM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> Not sure what you mean by infinite. OPHF or not any hash function's range
> is
> practically infinite for MD5 is infinite hash range. With OPHF you will
> have
> a lot of skew. There are only two ways to deal with it:
> (1) You know the space of of all your keys ahead of time and have your
> system exploit that.
> (2) Dynamically load balance until your system reaches steady state.
>
> We are always sorting lexicographically and trying to do both of the above.
>
> Avinash
>
> On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <sc...@gmail.com> wrote:
>
> > Avinash,
> >
> > I support the proposal of Jonathan.
> >
> > Theoretically,  if the system wants to support range query over
> consistent
> > hash(DHT),  the hash space should be infinite. Otherwise,  a large number
> > of
> > keys could be mapped to the same token and  it's impossible to do load
> > balance by moving these keys around to other physical servers.
>  Especially,
> > the churn of OPHF could be very high because the user would like to
> create
> > a
> > lot of keys with the same long prefix for some applications. If the
> system
> > takes the string as the token and sorts them in *lexicographic order, the
> > hash space would be infinite, so that the churn can be solved by load
> > balacing under any circumstances.
> >
> > *best regards,
> > hanzhu
> >
> >
> > On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
> > avinash.lakshman@gmail.com> wrote:
> >
> > > So how are you coming up with the tokens here? What do you mean by
> > > string[0]
> > > and string[last]? Are they the keys that belong to the system? If so
> then
> > > that is not a mechanism to reason about the range of your hash
> function.
> > In
> > > this scheme does it mean you will need to sort the tokens too using
> > > collation scheme used by the external partitioner while identifying
> which
> > > key goes to which node. Also could you provide me with the patch
> number.
> > I
> > > need to go over that this weekend and make sure if it does not affect
> the
> > > (bootstrap/load balance) logic. My fear is that these changes have long
> > > reaching effects and I need to make sure that all these pieces will
> > > continue
> > > to reside in harmony. Also your range query test did it run in a
> > > distributed
> > > environment or in a single box environment?
> > > Avinash
> > >
> > > On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com>
> > wrote:
> > >
> > > > So with a String ring you just sort the strings (conceptually) and
> you
> > > > wrap around from strings[last] to strings[0].  The range is entirely
> > > > defined by what tokens you have.
> > > >
> > > > It's nice to be able to test against a real application.  Mine won't
> > > > run without range queries...  (So it is still running against my
> > > > github code where I first wrote range queries, but that uses the OPHF
> > > > approach you have and so that is where I found the problems.)
> > > >
> > > > -Jonathan
> > > >
> > > > On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
> > > > <av...@gmail.com> wrote:
> > > > > I will look into this. Another point about P2P rings is something
> > about
> > > > the
> > > > > ability to reason about the system. In any hash fn we hve a notion
> of
> > > > range
> > > > > and how the ring wraps around. For eg. in random hash such as MD5
> we
> > > say
> > > > > 0-2^128 -1 and then the wrap around. Similarly for the hash fn used
> > > here
> > > > one
> > > > > could define the same. Today we perform usual sort of strings and
> the
> > > > hash
> > > > > should respect that sort order. I guess you are saying the usual
> sort
> > > of
> > > > > string does not respect collation. I will look into this and I
> insist
> > > > this
> > > > > should be very doable. Changes to the implementation of the core
> > > classes
> > > > > albeit how simple they are very very scary unless they are
> completely
> > > > tested
> > > > > too in a distributed setting. Over here we do not have detailed
> test
> > > > code,
> > > > > but we test by directing a % of the site traffic to a test cluster
> > > before
> > > > we
> > > > > sign off on anything.
> > > > >
> > > > > Avinash
> > > > >
> > > > > On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
> > > > wrote:
> > > > >
> > > > >> Say for instance you have inserted keys ['a', 'b', '--a', '--b']
> and
> > > > >> you are going to do range queries on them.  The correct
> > > > >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> > > > >> char value gives ['--a', '--b', 'a', 'b'].
> > > > >>
> > > > >> Attached is a test program illustrating this.  The assert will
> fail
> > > > >> because the hash ordering is not the same as the collator's.
> > > > >>
> > > > >> -Jonathan
> > > > >>
> > > > >> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> > > > >> <av...@gmail.com> wrote:
> > > > >> > In fact what would you want is hash to respect oorder based on
> how
> > > the
> > > > >> > strings are sorted. For the example you have it does. So am I
> > > missing
> > > > >> > something?
> > > > >> >
> > > > >> > Avinash
> > > > >> >
> > > > >> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <
> jbellis@gmail.com
> > >
> > > > >> wrote:
> > > > >> >
> > > > >> >> For that aspect no difference between a String ring based on
> > > > compareTo
> > > > >> >> and a BigInteger one.  The only difference (and it is an
> > important
> > > > one
> > > > >> >> for the reasons I gave!) is how the compare works.  But for the
> > p2p
> > > > >> >> aspect it does not matter.
> > > > >> >>
> > > > >> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> > > > >> >> <av...@gmail.com> wrote:
> > > > >> >> > Doing what you are suggesting scares the hell out of me for a
> > > > couple
> > > > >> of
> > > > >> >> > reasons -  All work in P2P be it random/OPHF does the token
> > > > handling
> > > > >> the
> > > > >> >> way
> > > > >> >> > it is setup. I cannot try something that has not been well
> > > explored
> > > > in
> > > > >> >> > academia. I insist this must be doable. I am going to think
> > about
> > > > this
> > > > >> >> more.
> > > > >> >> >
> > > > >> >> > Avinash
> > > > >> >> >
> > > > >> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
> > > jbellis@gmail.com>
> > > > >> >> wrote:
> > > > >> >> >
> > > > >> >> >> Avinash already commited his new order-preserving hash
> > function
> > > > and I
> > > > >> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It
> takes
> > > the
> > > > >> >> >> approach that Todd and I discussed back in January: turn the
> > > > string
> > > > >> >> >> into a base-Char.MAX_VALUE number.
> > > > >> >> >> (
> > > > >> >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> > > > >> >> >> ).
> > > > >> >> >>  I chatted with Avinash a little on IM but we didn't finish,
> > so
> > > > I'm
> > > > >> >> >> picking it up here.
> > > > >> >> >>
> > > > >> >> >> There are two problems with this approach.  One is that the
> > > hashes
> > > > >> >> >> will only be order-preserving for a subset of unicode (all
> of
> > > > UCS-2,
> > > > >> >> >> but not all of UTF-16; see
> > > > http://en.wikipedia.org/wiki/UTF-16/UCS-2
> > > > >> ).
> > > > >> >> >>  The other is that this only gives you a naive ordering by
> > code
> > > > point
> > > > >> >> >> value, which for unicode is not what you want and even for
> > ascii
> > > > >> >> >> sometimes you want another collation.  (see
> > > > >> >> >> http://www.unicode.org/reports/tr10/ and
> > > > >> >> >>
> http://java.sun.com/javase/6/docs/api/java/text/Collator.html
> > ).
> > > > >> >> >>
> > > > >> >> >> Say for instance you have inserted keys ['a', 'b', '--a',
> > '--b']
> > > > and
> > > > >> >> >> you are going to do range queries on them.  The correct
> > > > >> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But
> > ordering
> > > by
> > > > >> >> >> char value gives ['--a', '--b', 'a', 'b'].
> > > > >> >> >>
> > > > >> >> >> Switching to a more flexibile system like the one I wrote
> for
> > > > >> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
> > > > >> distribution
> > > > >> >> >> or Token<String> for order-preserving, with user-defined
> > > > collation.
> > > > >>  I
> > > > >> >> >> don't see a way to get this kind of flexibility from an
> > approach
> > > > that
> > > > >> >> >> insists on turning everything into BigInteger.
> > > > >> >> >>
> > > > >> >> >> -Jonathan
> > > > >> >> >>
> > > > >> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
> > > > jbellis@gmail.com>
> > > > >> >> wrote:
> > > > >> >> >> > Avinash,
> > > > >> >> >> >
> > > > >> >> >> > You mentioned that you have a new order-preserving hash
> > > function
> > > > >> that
> > > > >> >> >> > you think will be more generally useful.  Can you post it?
> > > > >> >> >> >
> > > > >> >> >> > thanks,
> > > > >> >> >> >
> > > > >> >> >> > -Jonathan
> > > > >> >> >> >
> > > > >> >> >>
> > > > >> >> >
> > > > >> >>
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: OPHF

Posted by Avinash Lakshman <av...@gmail.com>.

Not sure what you mean by infinite. OPHF or not any hash function's range is
practically infinite for MD5 is infinite hash range. With OPHF you will have
a lot of skew. There are only two ways to deal with it:
(1) You know the space of of all your keys ahead of time and have your
system exploit that.
(2) Dynamically load balance until your system reaches steady state.

We are always sorting lexicographically and trying to do both of the above.

Avinash

On Sat, Apr 4, 2009 at 12:18 AM, Zhu Han <sc...@gmail.com> wrote:

> Avinash,
>
> I support the proposal of Jonathan.
>
> Theoretically,  if the system wants to support range query over consistent
> hash(DHT),  the hash space should be infinite. Otherwise,  a large number
> of
> keys could be mapped to the same token and  it's impossible to do load
> balance by moving these keys around to other physical servers.  Especially,
> the churn of OPHF could be very high because the user would like to create
> a
> lot of keys with the same long prefix for some applications. If the system
> takes the string as the token and sorts them in *lexicographic order, the
> hash space would be infinite, so that the churn can be solved by load
> balacing under any circumstances.
>
> *best regards,
> hanzhu
>
>
> On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
> avinash.lakshman@gmail.com> wrote:
>
> > So how are you coming up with the tokens here? What do you mean by
> > string[0]
> > and string[last]? Are they the keys that belong to the system? If so then
> > that is not a mechanism to reason about the range of your hash function.
> In
> > this scheme does it mean you will need to sort the tokens too using
> > collation scheme used by the external partitioner while identifying which
> > key goes to which node. Also could you provide me with the patch number.
> I
> > need to go over that this weekend and make sure if it does not affect the
> > (bootstrap/load balance) logic. My fear is that these changes have long
> > reaching effects and I need to make sure that all these pieces will
> > continue
> > to reside in harmony. Also your range query test did it run in a
> > distributed
> > environment or in a single box environment?
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> > > So with a String ring you just sort the strings (conceptually) and you
> > > wrap around from strings[last] to strings[0].  The range is entirely
> > > defined by what tokens you have.
> > >
> > > It's nice to be able to test against a real application.  Mine won't
> > > run without range queries...  (So it is still running against my
> > > github code where I first wrote range queries, but that uses the OPHF
> > > approach you have and so that is where I found the problems.)
> > >
> > > -Jonathan
> > >
> > > On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
> > > <av...@gmail.com> wrote:
> > > > I will look into this. Another point about P2P rings is something
> about
> > > the
> > > > ability to reason about the system. In any hash fn we hve a notion of
> > > range
> > > > and how the ring wraps around. For eg. in random hash such as MD5 we
> > say
> > > > 0-2^128 -1 and then the wrap around. Similarly for the hash fn used
> > here
> > > one
> > > > could define the same. Today we perform usual sort of strings and the
> > > hash
> > > > should respect that sort order. I guess you are saying the usual sort
> > of
> > > > string does not respect collation. I will look into this and I insist
> > > this
> > > > should be very doable. Changes to the implementation of the core
> > classes
> > > > albeit how simple they are very very scary unless they are completely
> > > tested
> > > > too in a distributed setting. Over here we do not have detailed test
> > > code,
> > > > but we test by directing a % of the site traffic to a test cluster
> > before
> > > we
> > > > sign off on anything.
> > > >
> > > > Avinash
> > > >
> > > > On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
> > > wrote:
> > > >
> > > >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> > > >> you are going to do range queries on them.  The correct
> > > >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> > > >> char value gives ['--a', '--b', 'a', 'b'].
> > > >>
> > > >> Attached is a test program illustrating this.  The assert will fail
> > > >> because the hash ordering is not the same as the collator's.
> > > >>
> > > >> -Jonathan
> > > >>
> > > >> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> > > >> <av...@gmail.com> wrote:
> > > >> > In fact what would you want is hash to respect oorder based on how
> > the
> > > >> > strings are sorted. For the example you have it does. So am I
> > missing
> > > >> > something?
> > > >> >
> > > >> > Avinash
> > > >> >
> > > >> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jbellis@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> >> For that aspect no difference between a String ring based on
> > > compareTo
> > > >> >> and a BigInteger one.  The only difference (and it is an
> important
> > > one
> > > >> >> for the reasons I gave!) is how the compare works.  But for the
> p2p
> > > >> >> aspect it does not matter.
> > > >> >>
> > > >> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> > > >> >> <av...@gmail.com> wrote:
> > > >> >> > Doing what you are suggesting scares the hell out of me for a
> > > couple
> > > >> of
> > > >> >> > reasons -  All work in P2P be it random/OPHF does the token
> > > handling
> > > >> the
> > > >> >> way
> > > >> >> > it is setup. I cannot try something that has not been well
> > explored
> > > in
> > > >> >> > academia. I insist this must be doable. I am going to think
> about
> > > this
> > > >> >> more.
> > > >> >> >
> > > >> >> > Avinash
> > > >> >> >
> > > >> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
> > jbellis@gmail.com>
> > > >> >> wrote:
> > > >> >> >
> > > >> >> >> Avinash already commited his new order-preserving hash
> function
> > > and I
> > > >> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes
> > the
> > > >> >> >> approach that Todd and I discussed back in January: turn the
> > > string
> > > >> >> >> into a base-Char.MAX_VALUE number.
> > > >> >> >> (
> > > >> >> >>
> > > >> >>
> > > >>
> > >
> >
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> > > >> >> >> ).
> > > >> >> >>  I chatted with Avinash a little on IM but we didn't finish,
> so
> > > I'm
> > > >> >> >> picking it up here.
> > > >> >> >>
> > > >> >> >> There are two problems with this approach.  One is that the
> > hashes
> > > >> >> >> will only be order-preserving for a subset of unicode (all of
> > > UCS-2,
> > > >> >> >> but not all of UTF-16; see
> > > http://en.wikipedia.org/wiki/UTF-16/UCS-2
> > > >> ).
> > > >> >> >>  The other is that this only gives you a naive ordering by
> code
> > > point
> > > >> >> >> value, which for unicode is not what you want and even for
> ascii
> > > >> >> >> sometimes you want another collation.  (see
> > > >> >> >> http://www.unicode.org/reports/tr10/ and
> > > >> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html
> ).
> > > >> >> >>
> > > >> >> >> Say for instance you have inserted keys ['a', 'b', '--a',
> '--b']
> > > and
> > > >> >> >> you are going to do range queries on them.  The correct
> > > >> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But
> ordering
> > by
> > > >> >> >> char value gives ['--a', '--b', 'a', 'b'].
> > > >> >> >>
> > > >> >> >> Switching to a more flexibile system like the one I wrote for
> > > >> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
> > > >> distribution
> > > >> >> >> or Token<String> for order-preserving, with user-defined
> > > collation.
> > > >>  I
> > > >> >> >> don't see a way to get this kind of flexibility from an
> approach
> > > that
> > > >> >> >> insists on turning everything into BigInteger.
> > > >> >> >>
> > > >> >> >> -Jonathan
> > > >> >> >>
> > > >> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
> > > jbellis@gmail.com>
> > > >> >> wrote:
> > > >> >> >> > Avinash,
> > > >> >> >> >
> > > >> >> >> > You mentioned that you have a new order-preserving hash
> > function
> > > >> that
> > > >> >> >> > you think will be more generally useful.  Can you post it?
> > > >> >> >> >
> > > >> >> >> > thanks,
> > > >> >> >> >
> > > >> >> >> > -Jonathan
> > > >> >> >> >
> > > >> >> >>
> > > >> >> >
> > > >> >>
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: OPHF

Posted by Zhu Han <sc...@gmail.com>.

Avinash,

I support the proposal of Jonathan.

Theoretically,  if the system wants to support range query over consistent
hash(DHT),  the hash space should be infinite. Otherwise,  a large number of
keys could be mapped to the same token and  it's impossible to do load
balance by moving these keys around to other physical servers.  Especially,
the churn of OPHF could be very high because the user would like to create a
lot of keys with the same long prefix for some applications. If the system
takes the string as the token and sorts them in *lexicographic order, the
hash space would be infinite, so that the churn can be solved by load
balacing under any circumstances.

*best regards,
hanzhu


On Thu, Apr 2, 2009 at 11:04 PM, Avinash Lakshman <
avinash.lakshman@gmail.com> wrote:

> So how are you coming up with the tokens here? What do you mean by
> string[0]
> and string[last]? Are they the keys that belong to the system? If so then
> that is not a mechanism to reason about the range of your hash function. In
> this scheme does it mean you will need to sort the tokens too using
> collation scheme used by the external partitioner while identifying which
> key goes to which node. Also could you provide me with the patch number. I
> need to go over that this weekend and make sure if it does not affect the
> (bootstrap/load balance) logic. My fear is that these changes have long
> reaching effects and I need to make sure that all these pieces will
> continue
> to reside in harmony. Also your range query test did it run in a
> distributed
> environment or in a single box environment?
> Avinash
>
> On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
> > So with a String ring you just sort the strings (conceptually) and you
> > wrap around from strings[last] to strings[0].  The range is entirely
> > defined by what tokens you have.
> >
> > It's nice to be able to test against a real application.  Mine won't
> > run without range queries...  (So it is still running against my
> > github code where I first wrote range queries, but that uses the OPHF
> > approach you have and so that is where I found the problems.)
> >
> > -Jonathan
> >
> > On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
> > <av...@gmail.com> wrote:
> > > I will look into this. Another point about P2P rings is something about
> > the
> > > ability to reason about the system. In any hash fn we hve a notion of
> > range
> > > and how the ring wraps around. For eg. in random hash such as MD5 we
> say
> > > 0-2^128 -1 and then the wrap around. Similarly for the hash fn used
> here
> > one
> > > could define the same. Today we perform usual sort of strings and the
> > hash
> > > should respect that sort order. I guess you are saying the usual sort
> of
> > > string does not respect collation. I will look into this and I insist
> > this
> > > should be very doable. Changes to the implementation of the core
> classes
> > > albeit how simple they are very very scary unless they are completely
> > tested
> > > too in a distributed setting. Over here we do not have detailed test
> > code,
> > > but we test by directing a % of the site traffic to a test cluster
> before
> > we
> > > sign off on anything.
> > >
> > > Avinash
> > >
> > > On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
> > wrote:
> > >
> > >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> > >> you are going to do range queries on them.  The correct
> > >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> > >> char value gives ['--a', '--b', 'a', 'b'].
> > >>
> > >> Attached is a test program illustrating this.  The assert will fail
> > >> because the hash ordering is not the same as the collator's.
> > >>
> > >> -Jonathan
> > >>
> > >> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> > >> <av...@gmail.com> wrote:
> > >> > In fact what would you want is hash to respect oorder based on how
> the
> > >> > strings are sorted. For the example you have it does. So am I
> missing
> > >> > something?
> > >> >
> > >> > Avinash
> > >> >
> > >> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com>
> > >> wrote:
> > >> >
> > >> >> For that aspect no difference between a String ring based on
> > compareTo
> > >> >> and a BigInteger one.  The only difference (and it is an important
> > one
> > >> >> for the reasons I gave!) is how the compare works.  But for the p2p
> > >> >> aspect it does not matter.
> > >> >>
> > >> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> > >> >> <av...@gmail.com> wrote:
> > >> >> > Doing what you are suggesting scares the hell out of me for a
> > couple
> > >> of
> > >> >> > reasons -  All work in P2P be it random/OPHF does the token
> > handling
> > >> the
> > >> >> way
> > >> >> > it is setup. I cannot try something that has not been well
> explored
> > in
> > >> >> > academia. I insist this must be doable. I am going to think about
> > this
> > >> >> more.
> > >> >> >
> > >> >> > Avinash
> > >> >> >
> > >> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <
> jbellis@gmail.com>
> > >> >> wrote:
> > >> >> >
> > >> >> >> Avinash already commited his new order-preserving hash function
> > and I
> > >> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes
> the
> > >> >> >> approach that Todd and I discussed back in January: turn the
> > string
> > >> >> >> into a base-Char.MAX_VALUE number.
> > >> >> >> (
> > >> >> >>
> > >> >>
> > >>
> >
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> > >> >> >> ).
> > >> >> >>  I chatted with Avinash a little on IM but we didn't finish, so
> > I'm
> > >> >> >> picking it up here.
> > >> >> >>
> > >> >> >> There are two problems with this approach.  One is that the
> hashes
> > >> >> >> will only be order-preserving for a subset of unicode (all of
> > UCS-2,
> > >> >> >> but not all of UTF-16; see
> > http://en.wikipedia.org/wiki/UTF-16/UCS-2
> > >> ).
> > >> >> >>  The other is that this only gives you a naive ordering by code
> > point
> > >> >> >> value, which for unicode is not what you want and even for ascii
> > >> >> >> sometimes you want another collation.  (see
> > >> >> >> http://www.unicode.org/reports/tr10/ and
> > >> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
> > >> >> >>
> > >> >> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b']
> > and
> > >> >> >> you are going to do range queries on them.  The correct
> > >> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering
> by
> > >> >> >> char value gives ['--a', '--b', 'a', 'b'].
> > >> >> >>
> > >> >> >> Switching to a more flexibile system like the one I wrote for
> > >> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
> > >> distribution
> > >> >> >> or Token<String> for order-preserving, with user-defined
> > collation.
> > >>  I
> > >> >> >> don't see a way to get this kind of flexibility from an approach
> > that
> > >> >> >> insists on turning everything into BigInteger.
> > >> >> >>
> > >> >> >> -Jonathan
> > >> >> >>
> > >> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
> > jbellis@gmail.com>
> > >> >> wrote:
> > >> >> >> > Avinash,
> > >> >> >> >
> > >> >> >> > You mentioned that you have a new order-preserving hash
> function
> > >> that
> > >> >> >> > you think will be more generally useful.  Can you post it?
> > >> >> >> >
> > >> >> >> > thanks,
> > >> >> >> >
> > >> >> >> > -Jonathan
> > >> >> >> >
> > >> >> >>
> > >> >> >
> > >> >>
> > >> >
> > >>
> > >
> >
>

Re: OPHF

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Apr 2, 2009 at 9:04 AM, Avinash Lakshman
<av...@gmail.com> wrote:
> So how are you coming up with the tokens here?

For Token<String> the key's tokens (that is, the value we compare
against the node tokens to determine partitioning) are just the key
itself, wrapped in the Token interface.

Node token generation gets moved into the IPartitioner interface, and
the partitioner is loaded with class.forName, so users can easily
define a token generator that meets their needs.  (For instance for my
app I will probably need to hand-specify tokens in a config file at
first until I can work on load balancing.)  But there is a random
string generator that will work out of the box for testing.

> What do you mean by string[0]
> and string[last]? Are they the keys that belong to the system?

I meant the tokens assigned to the nodes.  So, the "next" node after
the last node is the first node.  It doesn't matter where the
wraparound point is for the key domain (there usually won't be one,
for strings), but that's okay, because the important concept is not
the wrap point but what the "next" node is for replication etc.

That is, the algorithm for getStorageEndPoints is unchanged -- get a
sorted list of tokens/node mappings, and binary search on the key's
token.  Doesn't matter if the tokens are BigIntegers or Strings.

> In
> this scheme does it mean you will need to sort the tokens too using
> collation scheme used by the external partitioner while identifying which
> key goes to which node.

Right.  I add a compareTo method (and reverseCompareTo for
convenience) to the IPartitioner interface and that needs to be used
wherever we were comparing or sorting.

> Also could you provide me with the patch number. I
> need to go over that this weekend and make sure if it does not affect the
> (bootstrap/load balance) logic.

The old ones from #3 won't apply any more because of conflicts but I
will rebase them against the current trunk.  (When do you expect to
commit again?  I can wait for that to make sure it will be a clean
apply if that is in the near future.)

> My fear is that these changes have long
> reaching effects and I need to make sure that all these pieces will continue
> to reside in harmony. Also your range query test did it run in a distributed
> environment or in a single box environment?

We're testing on a five-node cluster.

-Jonathan

Re: OPHF

Posted by Avinash Lakshman <av...@gmail.com>.

So how are you coming up with the tokens here? What do you mean by string[0]
and string[last]? Are they the keys that belong to the system? If so then
that is not a mechanism to reason about the range of your hash function. In
this scheme does it mean you will need to sort the tokens too using
collation scheme used by the external partitioner while identifying which
key goes to which node. Also could you provide me with the patch number. I
need to go over that this weekend and make sure if it does not affect the
(bootstrap/load balance) logic. My fear is that these changes have long
reaching effects and I need to make sure that all these pieces will continue
to reside in harmony. Also your range query test did it run in a distributed
environment or in a single box environment?
Avinash

On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> So with a String ring you just sort the strings (conceptually) and you
> wrap around from strings[last] to strings[0].  The range is entirely
> defined by what tokens you have.
>
> It's nice to be able to test against a real application.  Mine won't
> run without range queries...  (So it is still running against my
> github code where I first wrote range queries, but that uses the OPHF
> approach you have and so that is where I found the problems.)
>
> -Jonathan
>
> On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
> <av...@gmail.com> wrote:
> > I will look into this. Another point about P2P rings is something about
> the
> > ability to reason about the system. In any hash fn we hve a notion of
> range
> > and how the ring wraps around. For eg. in random hash such as MD5 we say
> > 0-2^128 -1 and then the wrap around. Similarly for the hash fn used here
> one
> > could define the same. Today we perform usual sort of strings and the
> hash
> > should respect that sort order. I guess you are saying the usual sort of
> > string does not respect collation. I will look into this and I insist
> this
> > should be very doable. Changes to the implementation of the core classes
> > albeit how simple they are very very scary unless they are completely
> tested
> > too in a distributed setting. Over here we do not have detailed test
> code,
> > but we test by directing a % of the site traffic to a test cluster before
> we
> > sign off on anything.
> >
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> >> you are going to do range queries on them.  The correct
> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> char value gives ['--a', '--b', 'a', 'b'].
> >>
> >> Attached is a test program illustrating this.  The assert will fail
> >> because the hash ordering is not the same as the collator's.
> >>
> >> -Jonathan
> >>
> >> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> >> <av...@gmail.com> wrote:
> >> > In fact what would you want is hash to respect oorder based on how the
> >> > strings are sorted. For the example you have it does. So am I missing
> >> > something?
> >> >
> >> > Avinash
> >> >
> >> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com>
> >> wrote:
> >> >
> >> >> For that aspect no difference between a String ring based on
> compareTo
> >> >> and a BigInteger one.  The only difference (and it is an important
> one
> >> >> for the reasons I gave!) is how the compare works.  But for the p2p
> >> >> aspect it does not matter.
> >> >>
> >> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> >> >> <av...@gmail.com> wrote:
> >> >> > Doing what you are suggesting scares the hell out of me for a
> couple
> >> of
> >> >> > reasons -  All work in P2P be it random/OPHF does the token
> handling
> >> the
> >> >> way
> >> >> > it is setup. I cannot try something that has not been well explored
> in
> >> >> > academia. I insist this must be doable. I am going to think about
> this
> >> >> more.
> >> >> >
> >> >> > Avinash
> >> >> >
> >> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Avinash already commited his new order-preserving hash function
> and I
> >> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> >> >> >> approach that Todd and I discussed back in January: turn the
> string
> >> >> >> into a base-Char.MAX_VALUE number.
> >> >> >> (
> >> >> >>
> >> >>
> >>
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> >> >> >> ).
> >> >> >>  I chatted with Avinash a little on IM but we didn't finish, so
> I'm
> >> >> >> picking it up here.
> >> >> >>
> >> >> >> There are two problems with this approach.  One is that the hashes
> >> >> >> will only be order-preserving for a subset of unicode (all of
> UCS-2,
> >> >> >> but not all of UTF-16; see
> http://en.wikipedia.org/wiki/UTF-16/UCS-2
> >> ).
> >> >> >>  The other is that this only gives you a naive ordering by code
> point
> >> >> >> value, which for unicode is not what you want and even for ascii
> >> >> >> sometimes you want another collation.  (see
> >> >> >> http://www.unicode.org/reports/tr10/ and
> >> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
> >> >> >>
> >> >> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b']
> and
> >> >> >> you are going to do range queries on them.  The correct
> >> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> >> >> char value gives ['--a', '--b', 'a', 'b'].
> >> >> >>
> >> >> >> Switching to a more flexibile system like the one I wrote for
> >> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
> >> distribution
> >> >> >> or Token<String> for order-preserving, with user-defined
> collation.
> >>  I
> >> >> >> don't see a way to get this kind of flexibility from an approach
> that
> >> >> >> insists on turning everything into BigInteger.
> >> >> >>
> >> >> >> -Jonathan
> >> >> >>
> >> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
> jbellis@gmail.com>
> >> >> wrote:
> >> >> >> > Avinash,
> >> >> >> >
> >> >> >> > You mentioned that you have a new order-preserving hash function
> >> that
> >> >> >> > you think will be more generally useful.  Can you post it?
> >> >> >> >
> >> >> >> > thanks,
> >> >> >> >
> >> >> >> > -Jonathan
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: OPHF

Posted by Jonathan ellis <jb...@gmail.com>.

Because that is the ordering the app needs.  (but! Other apps will  
have different needs. That is why it must be pluggable.) Yes, key  
sorting at sstable level needs to be consistent with what the  
partioner uses.  My patch does this.

-Jonathan

On Apr 1, 2009, at 10:43 PM, Avinash Lakshman <avinash.lakshman@gmail.com 
 > wrote:

> So why do you want to sort with collation? Because if you sort that  
> way for
> distribution then everything from sorting of keys on disk will have to
> change. This is not something that I can see any apparent reason  
> for. But I
> will look at this.
> Avinash
>
> On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com>  
> wrote:
>
>> So with a String ring you just sort the strings (conceptually) and  
>> you
>> wrap around from strings[last] to strings[0].  The range is entirely
>> defined by what tokens you have.
>>
>> It's nice to be able to test against a real application.  Mine won't
>> run without range queries...  (So it is still running against my
>> github code where I first wrote range queries, but that uses the OPHF
>> approach you have and so that is where I found the problems.)
>>
>> -Jonathan
>>
>> On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
>> <av...@gmail.com> wrote:
>>> I will look into this. Another point about P2P rings is something  
>>> about
>> the
>>> ability to reason about the system. In any hash fn we hve a notion  
>>> of
>> range
>>> and how the ring wraps around. For eg. in random hash such as MD5  
>>> we say
>>> 0-2^128 -1 and then the wrap around. Similarly for the hash fn  
>>> used here
>> one
>>> could define the same. Today we perform usual sort of strings and  
>>> the
>> hash
>>> should respect that sort order. I guess you are saying the usual  
>>> sort of
>>> string does not respect collation. I will look into this and I  
>>> insist
>> this
>>> should be very doable. Changes to the implementation of the core  
>>> classes
>>> albeit how simple they are very very scary unless they are  
>>> completely
>> tested
>>> too in a distributed setting. Over here we do not have detailed test
>> code,
>>> but we test by directing a % of the site traffic to a test cluster  
>>> before
>> we
>>> sign off on anything.
>>>
>>> Avinash
>>>
>>> On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>>>
>>>> Say for instance you have inserted keys ['a', 'b', '--a', '--b']  
>>>> and
>>>> you are going to do range queries on them.  The correct
>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>
>>>> Attached is a test program illustrating this.  The assert will fail
>>>> because the hash ordering is not the same as the collator's.
>>>>
>>>> -Jonathan
>>>>
>>>> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
>>>> <av...@gmail.com> wrote:
>>>>> In fact what would you want is hash to respect oorder based on  
>>>>> how the
>>>>> strings are sorted. For the example you have it does. So am I  
>>>>> missing
>>>>> something?
>>>>>
>>>>> Avinash
>>>>>
>>>>> On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com>
>>>> wrote:
>>>>>
>>>>>> For that aspect no difference between a String ring based on
>> compareTo
>>>>>> and a BigInteger one.  The only difference (and it is an  
>>>>>> important
>> one
>>>>>> for the reasons I gave!) is how the compare works.  But for the  
>>>>>> p2p
>>>>>> aspect it does not matter.
>>>>>>
>>>>>> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>>>>>> <av...@gmail.com> wrote:
>>>>>>> Doing what you are suggesting scares the hell out of me for a
>> couple
>>>> of
>>>>>>> reasons -  All work in P2P be it random/OPHF does the token
>> handling
>>>> the
>>>>>> way
>>>>>>> it is setup. I cannot try something that has not been well  
>>>>>>> explored
>> in
>>>>>>> academia. I insist this must be doable. I am going to think  
>>>>>>> about
>> this
>>>>>> more.
>>>>>>>
>>>>>>> Avinash
>>>>>>>
>>>>>>> On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jbellis@gmail.com 
>>>>>>> >
>>>>>> wrote:
>>>>>>>
>>>>>>>> Avinash already commited his new order-preserving hash function
>> and I
>>>>>>>> missed it.  It's in OrderPreservingHashPartitioner.  It takes  
>>>>>>>> the
>>>>>>>> approach that Todd and I discussed back in January: turn the
>> string
>>>>>>>> into a base-Char.MAX_VALUE number.
>>>>>>>> (
>>>>>>>>
>>>>>>
>>>>
>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>>>>>>>> ).
>>>>>>>> I chatted with Avinash a little on IM but we didn't finish, so
>> I'm
>>>>>>>> picking it up here.
>>>>>>>>
>>>>>>>> There are two problems with this approach.  One is that the  
>>>>>>>> hashes
>>>>>>>> will only be order-preserving for a subset of unicode (all of
>> UCS-2,
>>>>>>>> but not all of UTF-16; see
>> http://en.wikipedia.org/wiki/UTF-16/UCS-2
>>>> ).
>>>>>>>> The other is that this only gives you a naive ordering by code
>> point
>>>>>>>> value, which for unicode is not what you want and even for  
>>>>>>>> ascii
>>>>>>>> sometimes you want another collation.  (see
>>>>>>>> http://www.unicode.org/reports/tr10/ and
>>>>>>>> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>>>>>>>>
>>>>>>>> Say for instance you have inserted keys ['a', 'b', '--a', '-- 
>>>>>>>> b']
>> and
>>>>>>>> you are going to do range queries on them.  The correct
>>>>>>>> collation-aware sort is ['a', '--a', 'b', '--b'].  But  
>>>>>>>> ordering by
>>>>>>>> char value gives ['--a', '--b', 'a', 'b'].
>>>>>>>>
>>>>>>>> Switching to a more flexibile system like the one I wrote for
>>>>>>>> CASSANDRA-3 will let use use Token<BigInteger> for random
>>>> distribution
>>>>>>>> or Token<String> for order-preserving, with user-defined
>> collation.
>>>> I
>>>>>>>> don't see a way to get this kind of flexibility from an  
>>>>>>>> approach
>> that
>>>>>>>> insists on turning everything into BigInteger.
>>>>>>>>
>>>>>>>> -Jonathan
>>>>>>>>
>>>>>>>> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
>> jbellis@gmail.com>
>>>>>> wrote:
>>>>>>>>> Avinash,
>>>>>>>>>
>>>>>>>>> You mentioned that you have a new order-preserving hash  
>>>>>>>>> function
>>>> that
>>>>>>>>> you think will be more generally useful.  Can you post it?
>>>>>>>>>
>>>>>>>>> thanks,
>>>>>>>>>
>>>>>>>>> -Jonathan
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>

Re: OPHF

Posted by Avinash Lakshman <av...@gmail.com>.

So why do you want to sort with collation? Because if you sort that way for
distribution then everything from sorting of keys on disk will have to
change. This is not something that I can see any apparent reason for. But I
will look at this.
Avinash

On Wed, Apr 1, 2009 at 3:34 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> So with a String ring you just sort the strings (conceptually) and you
> wrap around from strings[last] to strings[0].  The range is entirely
> defined by what tokens you have.
>
> It's nice to be able to test against a real application.  Mine won't
> run without range queries...  (So it is still running against my
> github code where I first wrote range queries, but that uses the OPHF
> approach you have and so that is where I found the problems.)
>
> -Jonathan
>
> On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
> <av...@gmail.com> wrote:
> > I will look into this. Another point about P2P rings is something about
> the
> > ability to reason about the system. In any hash fn we hve a notion of
> range
> > and how the ring wraps around. For eg. in random hash such as MD5 we say
> > 0-2^128 -1 and then the wrap around. Similarly for the hash fn used here
> one
> > could define the same. Today we perform usual sort of strings and the
> hash
> > should respect that sort order. I guess you are saying the usual sort of
> > string does not respect collation. I will look into this and I insist
> this
> > should be very doable. Changes to the implementation of the core classes
> > albeit how simple they are very very scary unless they are completely
> tested
> > too in a distributed setting. Over here we do not have detailed test
> code,
> > but we test by directing a % of the site traffic to a test cluster before
> we
> > sign off on anything.
> >
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> >> you are going to do range queries on them.  The correct
> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> char value gives ['--a', '--b', 'a', 'b'].
> >>
> >> Attached is a test program illustrating this.  The assert will fail
> >> because the hash ordering is not the same as the collator's.
> >>
> >> -Jonathan
> >>
> >> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> >> <av...@gmail.com> wrote:
> >> > In fact what would you want is hash to respect oorder based on how the
> >> > strings are sorted. For the example you have it does. So am I missing
> >> > something?
> >> >
> >> > Avinash
> >> >
> >> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com>
> >> wrote:
> >> >
> >> >> For that aspect no difference between a String ring based on
> compareTo
> >> >> and a BigInteger one.  The only difference (and it is an important
> one
> >> >> for the reasons I gave!) is how the compare works.  But for the p2p
> >> >> aspect it does not matter.
> >> >>
> >> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> >> >> <av...@gmail.com> wrote:
> >> >> > Doing what you are suggesting scares the hell out of me for a
> couple
> >> of
> >> >> > reasons -  All work in P2P be it random/OPHF does the token
> handling
> >> the
> >> >> way
> >> >> > it is setup. I cannot try something that has not been well explored
> in
> >> >> > academia. I insist this must be doable. I am going to think about
> this
> >> >> more.
> >> >> >
> >> >> > Avinash
> >> >> >
> >> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com>
> >> >> wrote:
> >> >> >
> >> >> >> Avinash already commited his new order-preserving hash function
> and I
> >> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> >> >> >> approach that Todd and I discussed back in January: turn the
> string
> >> >> >> into a base-Char.MAX_VALUE number.
> >> >> >> (
> >> >> >>
> >> >>
> >>
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> >> >> >> ).
> >> >> >>  I chatted with Avinash a little on IM but we didn't finish, so
> I'm
> >> >> >> picking it up here.
> >> >> >>
> >> >> >> There are two problems with this approach.  One is that the hashes
> >> >> >> will only be order-preserving for a subset of unicode (all of
> UCS-2,
> >> >> >> but not all of UTF-16; see
> http://en.wikipedia.org/wiki/UTF-16/UCS-2
> >> ).
> >> >> >>  The other is that this only gives you a naive ordering by code
> point
> >> >> >> value, which for unicode is not what you want and even for ascii
> >> >> >> sometimes you want another collation.  (see
> >> >> >> http://www.unicode.org/reports/tr10/ and
> >> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
> >> >> >>
> >> >> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b']
> and
> >> >> >> you are going to do range queries on them.  The correct
> >> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> >> >> char value gives ['--a', '--b', 'a', 'b'].
> >> >> >>
> >> >> >> Switching to a more flexibile system like the one I wrote for
> >> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
> >> distribution
> >> >> >> or Token<String> for order-preserving, with user-defined
> collation.
> >>  I
> >> >> >> don't see a way to get this kind of flexibility from an approach
> that
> >> >> >> insists on turning everything into BigInteger.
> >> >> >>
> >> >> >> -Jonathan
> >> >> >>
> >> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <
> jbellis@gmail.com>
> >> >> wrote:
> >> >> >> > Avinash,
> >> >> >> >
> >> >> >> > You mentioned that you have a new order-preserving hash function
> >> that
> >> >> >> > you think will be more generally useful.  Can you post it?
> >> >> >> >
> >> >> >> > thanks,
> >> >> >> >
> >> >> >> > -Jonathan
> >> >> >> >
> >> >> >>
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: OPHF

Posted by Jonathan Ellis <jb...@gmail.com>.

So with a String ring you just sort the strings (conceptually) and you
wrap around from strings[last] to strings[0].  The range is entirely
defined by what tokens you have.

It's nice to be able to test against a real application.  Mine won't
run without range queries...  (So it is still running against my
github code where I first wrote range queries, but that uses the OPHF
approach you have and so that is where I found the problems.)

-Jonathan

On Wed, Apr 1, 2009 at 4:27 PM, Avinash Lakshman
<av...@gmail.com> wrote:
> I will look into this. Another point about P2P rings is something about the
> ability to reason about the system. In any hash fn we hve a notion of range
> and how the ring wraps around. For eg. in random hash such as MD5 we say
> 0-2^128 -1 and then the wrap around. Similarly for the hash fn used here one
> could define the same. Today we perform usual sort of strings and the hash
> should respect that sort order. I guess you are saying the usual sort of
> string does not respect collation. I will look into this and I insist this
> should be very doable. Changes to the implementation of the core classes
> albeit how simple they are very very scary unless they are completely tested
> too in a distributed setting. Over here we do not have detailed test code,
> but we test by directing a % of the site traffic to a test cluster before we
> sign off on anything.
>
> Avinash
>
> On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
>> you are going to do range queries on them.  The correct
>> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>> char value gives ['--a', '--b', 'a', 'b'].
>>
>> Attached is a test program illustrating this.  The assert will fail
>> because the hash ordering is not the same as the collator's.
>>
>> -Jonathan
>>
>> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
>> <av...@gmail.com> wrote:
>> > In fact what would you want is hash to respect oorder based on how the
>> > strings are sorted. For the example you have it does. So am I missing
>> > something?
>> >
>> > Avinash
>> >
>> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >
>> >> For that aspect no difference between a String ring based on compareTo
>> >> and a BigInteger one.  The only difference (and it is an important one
>> >> for the reasons I gave!) is how the compare works.  But for the p2p
>> >> aspect it does not matter.
>> >>
>> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>> >> <av...@gmail.com> wrote:
>> >> > Doing what you are suggesting scares the hell out of me for a couple
>> of
>> >> > reasons -  All work in P2P be it random/OPHF does the token handling
>> the
>> >> way
>> >> > it is setup. I cannot try something that has not been well explored in
>> >> > academia. I insist this must be doable. I am going to think about this
>> >> more.
>> >> >
>> >> > Avinash
>> >> >
>> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com>
>> >> wrote:
>> >> >
>> >> >> Avinash already commited his new order-preserving hash function and I
>> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
>> >> >> approach that Todd and I discussed back in January: turn the string
>> >> >> into a base-Char.MAX_VALUE number.
>> >> >> (
>> >> >>
>> >>
>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>> >> >> ).
>> >> >>  I chatted with Avinash a little on IM but we didn't finish, so I'm
>> >> >> picking it up here.
>> >> >>
>> >> >> There are two problems with this approach.  One is that the hashes
>> >> >> will only be order-preserving for a subset of unicode (all of UCS-2,
>> >> >> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2
>> ).
>> >> >>  The other is that this only gives you a naive ordering by code point
>> >> >> value, which for unicode is not what you want and even for ascii
>> >> >> sometimes you want another collation.  (see
>> >> >> http://www.unicode.org/reports/tr10/ and
>> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>> >> >>
>> >> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
>> >> >> you are going to do range queries on them.  The correct
>> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>> >> >> char value gives ['--a', '--b', 'a', 'b'].
>> >> >>
>> >> >> Switching to a more flexibile system like the one I wrote for
>> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
>> distribution
>> >> >> or Token<String> for order-preserving, with user-defined collation.
>>  I
>> >> >> don't see a way to get this kind of flexibility from an approach that
>> >> >> insists on turning everything into BigInteger.
>> >> >>
>> >> >> -Jonathan
>> >> >>
>> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com>
>> >> wrote:
>> >> >> > Avinash,
>> >> >> >
>> >> >> > You mentioned that you have a new order-preserving hash function
>> that
>> >> >> > you think will be more generally useful.  Can you post it?
>> >> >> >
>> >> >> > thanks,
>> >> >> >
>> >> >> > -Jonathan
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: OPHF

Posted by Avinash Lakshman <av...@gmail.com>.

I will look into this. Another point about P2P rings is something about the
ability to reason about the system. In any hash fn we hve a notion of range
and how the ring wraps around. For eg. in random hash such as MD5 we say
0-2^128 -1 and then the wrap around. Similarly for the hash fn used here one
could define the same. Today we perform usual sort of strings and the hash
should respect that sort order. I guess you are saying the usual sort of
string does not respect collation. I will look into this and I insist this
should be very doable. Changes to the implementation of the core classes
albeit how simple they are very very scary unless they are completely tested
too in a distributed setting. Over here we do not have detailed test code,
but we test by directing a % of the site traffic to a test cluster before we
sign off on anything.

Avinash

On Wed, Apr 1, 2009 at 3:00 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> you are going to do range queries on them.  The correct
> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> char value gives ['--a', '--b', 'a', 'b'].
>
> Attached is a test program illustrating this.  The assert will fail
> because the hash ordering is not the same as the collator's.
>
> -Jonathan
>
> On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
> <av...@gmail.com> wrote:
> > In fact what would you want is hash to respect oorder based on how the
> > strings are sorted. For the example you have it does. So am I missing
> > something?
> >
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> For that aspect no difference between a String ring based on compareTo
> >> and a BigInteger one.  The only difference (and it is an important one
> >> for the reasons I gave!) is how the compare works.  But for the p2p
> >> aspect it does not matter.
> >>
> >> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> >> <av...@gmail.com> wrote:
> >> > Doing what you are suggesting scares the hell out of me for a couple
> of
> >> > reasons -  All work in P2P be it random/OPHF does the token handling
> the
> >> way
> >> > it is setup. I cannot try something that has not been well explored in
> >> > academia. I insist this must be doable. I am going to think about this
> >> more.
> >> >
> >> > Avinash
> >> >
> >> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com>
> >> wrote:
> >> >
> >> >> Avinash already commited his new order-preserving hash function and I
> >> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> >> >> approach that Todd and I discussed back in January: turn the string
> >> >> into a base-Char.MAX_VALUE number.
> >> >> (
> >> >>
> >>
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> >> >> ).
> >> >>  I chatted with Avinash a little on IM but we didn't finish, so I'm
> >> >> picking it up here.
> >> >>
> >> >> There are two problems with this approach.  One is that the hashes
> >> >> will only be order-preserving for a subset of unicode (all of UCS-2,
> >> >> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2
> ).
> >> >>  The other is that this only gives you a naive ordering by code point
> >> >> value, which for unicode is not what you want and even for ascii
> >> >> sometimes you want another collation.  (see
> >> >> http://www.unicode.org/reports/tr10/ and
> >> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
> >> >>
> >> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> >> >> you are going to do range queries on them.  The correct
> >> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> >> char value gives ['--a', '--b', 'a', 'b'].
> >> >>
> >> >> Switching to a more flexibile system like the one I wrote for
> >> >> CASSANDRA-3 will let use use Token<BigInteger> for random
> distribution
> >> >> or Token<String> for order-preserving, with user-defined collation.
>  I
> >> >> don't see a way to get this kind of flexibility from an approach that
> >> >> insists on turning everything into BigInteger.
> >> >>
> >> >> -Jonathan
> >> >>
> >> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com>
> >> wrote:
> >> >> > Avinash,
> >> >> >
> >> >> > You mentioned that you have a new order-preserving hash function
> that
> >> >> > you think will be more generally useful.  Can you post it?
> >> >> >
> >> >> > thanks,
> >> >> >
> >> >> > -Jonathan
> >> >> >
> >> >>
> >> >
> >>
> >
>

Re: OPHF

Posted by Jonathan Ellis <jb...@gmail.com>.

Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
you are going to do range queries on them.  The correct
collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
char value gives ['--a', '--b', 'a', 'b'].

Attached is a test program illustrating this.  The assert will fail
because the hash ordering is not the same as the collator's.

-Jonathan

On Wed, Apr 1, 2009 at 3:47 PM, Avinash Lakshman
<av...@gmail.com> wrote:
> In fact what would you want is hash to respect oorder based on how the
> strings are sorted. For the example you have it does. So am I missing
> something?
>
> Avinash
>
> On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> For that aspect no difference between a String ring based on compareTo
>> and a BigInteger one.  The only difference (and it is an important one
>> for the reasons I gave!) is how the compare works.  But for the p2p
>> aspect it does not matter.
>>
>> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
>> <av...@gmail.com> wrote:
>> > Doing what you are suggesting scares the hell out of me for a couple of
>> > reasons -  All work in P2P be it random/OPHF does the token handling the
>> way
>> > it is setup. I cannot try something that has not been well explored in
>> > academia. I insist this must be doable. I am going to think about this
>> more.
>> >
>> > Avinash
>> >
>> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >
>> >> Avinash already commited his new order-preserving hash function and I
>> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
>> >> approach that Todd and I discussed back in January: turn the string
>> >> into a base-Char.MAX_VALUE number.
>> >> (
>> >>
>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>> >> ).
>> >>  I chatted with Avinash a little on IM but we didn't finish, so I'm
>> >> picking it up here.
>> >>
>> >> There are two problems with this approach.  One is that the hashes
>> >> will only be order-preserving for a subset of unicode (all of UCS-2,
>> >> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
>> >>  The other is that this only gives you a naive ordering by code point
>> >> value, which for unicode is not what you want and even for ascii
>> >> sometimes you want another collation.  (see
>> >> http://www.unicode.org/reports/tr10/ and
>> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>> >>
>> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
>> >> you are going to do range queries on them.  The correct
>> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>> >> char value gives ['--a', '--b', 'a', 'b'].
>> >>
>> >> Switching to a more flexibile system like the one I wrote for
>> >> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
>> >> or Token<String> for order-preserving, with user-defined collation.  I
>> >> don't see a way to get this kind of flexibility from an approach that
>> >> insists on turning everything into BigInteger.
>> >>
>> >> -Jonathan
>> >>
>> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com>
>> wrote:
>> >> > Avinash,
>> >> >
>> >> > You mentioned that you have a new order-preserving hash function that
>> >> > you think will be more generally useful.  Can you post it?
>> >> >
>> >> > thanks,
>> >> >
>> >> > -Jonathan
>> >> >
>> >>
>> >
>>
>

Re: OPHF

Posted by Avinash Lakshman <av...@gmail.com>.

In fact what would you want is hash to respect oorder based on how the
strings are sorted. For the example you have it does. So am I missing
something?

Avinash

On Wed, Apr 1, 2009 at 2:43 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> For that aspect no difference between a String ring based on compareTo
> and a BigInteger one.  The only difference (and it is an important one
> for the reasons I gave!) is how the compare works.  But for the p2p
> aspect it does not matter.
>
> On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
> <av...@gmail.com> wrote:
> > Doing what you are suggesting scares the hell out of me for a couple of
> > reasons -  All work in P2P be it random/OPHF does the token handling the
> way
> > it is setup. I cannot try something that has not been well explored in
> > academia. I insist this must be doable. I am going to think about this
> more.
> >
> > Avinash
> >
> > On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >
> >> Avinash already commited his new order-preserving hash function and I
> >> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> >> approach that Todd and I discussed back in January: turn the string
> >> into a base-Char.MAX_VALUE number.
> >> (
> >>
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> >> ).
> >>  I chatted with Avinash a little on IM but we didn't finish, so I'm
> >> picking it up here.
> >>
> >> There are two problems with this approach.  One is that the hashes
> >> will only be order-preserving for a subset of unicode (all of UCS-2,
> >> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
> >>  The other is that this only gives you a naive ordering by code point
> >> value, which for unicode is not what you want and even for ascii
> >> sometimes you want another collation.  (see
> >> http://www.unicode.org/reports/tr10/ and
> >> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
> >>
> >> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> >> you are going to do range queries on them.  The correct
> >> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> >> char value gives ['--a', '--b', 'a', 'b'].
> >>
> >> Switching to a more flexibile system like the one I wrote for
> >> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
> >> or Token<String> for order-preserving, with user-defined collation.  I
> >> don't see a way to get this kind of flexibility from an approach that
> >> insists on turning everything into BigInteger.
> >>
> >> -Jonathan
> >>
> >> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com>
> wrote:
> >> > Avinash,
> >> >
> >> > You mentioned that you have a new order-preserving hash function that
> >> > you think will be more generally useful.  Can you post it?
> >> >
> >> > thanks,
> >> >
> >> > -Jonathan
> >> >
> >>
> >
>

Re: OPHF

Posted by Jonathan Ellis <jb...@gmail.com>.

For that aspect no difference between a String ring based on compareTo
and a BigInteger one.  The only difference (and it is an important one
for the reasons I gave!) is how the compare works.  But for the p2p
aspect it does not matter.

On Wed, Apr 1, 2009 at 3:40 PM, Avinash Lakshman
<av...@gmail.com> wrote:
> Doing what you are suggesting scares the hell out of me for a couple of
> reasons -  All work in P2P be it random/OPHF does the token handling the way
> it is setup. I cannot try something that has not been well explored in
> academia. I insist this must be doable. I am going to think about this more.
>
> Avinash
>
> On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>
>> Avinash already commited his new order-preserving hash function and I
>> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
>> approach that Todd and I discussed back in January: turn the string
>> into a base-Char.MAX_VALUE number.
>> (
>> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
>> ).
>>  I chatted with Avinash a little on IM but we didn't finish, so I'm
>> picking it up here.
>>
>> There are two problems with this approach.  One is that the hashes
>> will only be order-preserving for a subset of unicode (all of UCS-2,
>> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
>>  The other is that this only gives you a naive ordering by code point
>> value, which for unicode is not what you want and even for ascii
>> sometimes you want another collation.  (see
>> http://www.unicode.org/reports/tr10/ and
>> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>>
>> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
>> you are going to do range queries on them.  The correct
>> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
>> char value gives ['--a', '--b', 'a', 'b'].
>>
>> Switching to a more flexibile system like the one I wrote for
>> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
>> or Token<String> for order-preserving, with user-defined collation.  I
>> don't see a way to get this kind of flexibility from an approach that
>> insists on turning everything into BigInteger.
>>
>> -Jonathan
>>
>> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com> wrote:
>> > Avinash,
>> >
>> > You mentioned that you have a new order-preserving hash function that
>> > you think will be more generally useful.  Can you post it?
>> >
>> > thanks,
>> >
>> > -Jonathan
>> >
>>
>

Re: OPHF

Posted by Avinash Lakshman <av...@gmail.com>.

Doing what you are suggesting scares the hell out of me for a couple of
reasons -  All work in P2P be it random/OPHF does the token handling the way
it is setup. I cannot try something that has not been well explored in
academia. I insist this must be doable. I am going to think about this more.

Avinash

On Wed, Apr 1, 2009 at 2:31 PM, Jonathan Ellis <jb...@gmail.com> wrote:

> Avinash already commited his new order-preserving hash function and I
> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> approach that Todd and I discussed back in January: turn the string
> into a base-Char.MAX_VALUE number.
> (
> http://groups.google.com/group/cassandra-dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed
> ).
>  I chatted with Avinash a little on IM but we didn't finish, so I'm
> picking it up here.
>
> There are two problems with this approach.  One is that the hashes
> will only be order-preserving for a subset of unicode (all of UCS-2,
> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
>  The other is that this only gives you a naive ordering by code point
> value, which for unicode is not what you want and even for ascii
> sometimes you want another collation.  (see
> http://www.unicode.org/reports/tr10/ and
> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>
> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> you are going to do range queries on them.  The correct
> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> char value gives ['--a', '--b', 'a', 'b'].
>
> Switching to a more flexibile system like the one I wrote for
> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
> or Token<String> for order-preserving, with user-defined collation.  I
> don't see a way to get this kind of flexibility from an approach that
> insists on turning everything into BigInteger.
>
> -Jonathan
>
> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com> wrote:
> > Avinash,
> >
> > You mentioned that you have a new order-preserving hash function that
> > you think will be more generally useful.  Can you post it?
> >
> > thanks,
> >
> > -Jonathan
> >
>

Re: OPHF

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Apr 2, 2009 at 11:09 AM, Jun Rao <ju...@almaden.ibm.com> wrote:
>
> If the partitioner maps String->token, then the load balancer can be
> oblivious to the partitioner and do the balancing by reassigning tokens to
> EndPoint. On the other hand, if the partitioner maps String->EndPoint, then
> the load balancer has to tell the partitioner that it has changed some of
> the decision the partitioner already made. How will this work?

The partitioner maps Token -> EndPoint.  (And Key -> Token.)  So like
I said, I'm not sure exactly how load balancing will work but it will
be token-agnostic.

-Jonathan

Re: OPHF

Posted by Zhu Han <sc...@gmail.com>.

Jun,

This has be implemented in cassandra. When a node is added to the system. it
would gossip it's token to other peers. These peers would re-compute the
range to endpoint map after receiving the notification from the Gossiper.
And then the ranges would be transfered from one endpoint to another if the
mapping is changed for it. You can check LeaveJoinProtocolImpl.Java for the
implementation.

best regards,
hanzhu


On Fri, Apr 3, 2009 at 1:09 AM, Jun Rao <ju...@almaden.ibm.com> wrote:

>
> If the partitioner maps String->token, then the load balancer can be
> oblivious to the partitioner and do the balancing by reassigning tokens to
> EndPoint. On the other hand, if the partitioner maps String->EndPoint, then
> the load balancer has to tell the partitioner that it has changed some of
> the decision the partitioner already made. How will this work?
>
> Jun
> IBM Almaden Research Center
> K55/B1, 650 Harry Road, San Jose, CA  95120-6099
>
> junrao@almaden.ibm.com
>
>
> Jonathan Ellis <jb...@gmail.com> wrote on 04/02/2009 09:51:26 AM:
>
> >
> > On Thu, Apr 2, 2009 at 10:38 AM, Jun Rao <ju...@almaden.ibm.com> wrote:
> > >
> > > Johnathan,
> > >
> > > Could you explain a bit more how the pluggable partitioner (that
> provides
> > > String-> EndPoint mapping) works?
> > >
> > > In particular,
> > > 1. How does the partitioner obtain the list of EndPoints in the system?
> >
> > Same way it does now, through Gossip.  The main change is from
> >
> > private Map<BigInteger, EndPoint> tokenToEndPointMap_
> >
> > to
> >
> > private Map<Token, EndPoint> tokenToEndPointMap_
> >
> > where Token can either be implemented as a wrapper around BigIntegeror
> String.
> >
> > > 2. What happens when EndPoints are added/removed from the system? Does
> the
> > > partitioner have to remember the history of the EndPoints?
> >
> > Ditto.
> >
> > > 3. How does the partitioner coordinate with the load-balancer? Suppose
> some
> > > rows are moved from one EndPoint to another. How is that information
> > > communicated and used by the partitioner so that routing remains
> correct?
> >
> > Load balancing isn't complete yet; I didn't mean to imply otherwise.
> > So I don't know for sure.  Avinash has probably thought about this
> > more.  But having Token be abstract shouldn't affect the algorithm.
> >
> > -Jonathan
>

Re: OPHF

Posted by Jun Rao <ju...@almaden.ibm.com>.

If the partitioner maps String->token, then the load balancer can be
oblivious to the partitioner and do the balancing by reassigning tokens to
EndPoint. On the other hand, if the partitioner maps String->EndPoint, then
the load balancer has to tell the partitioner that it has changed some of
the decision the partitioner already made. How will this work?

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com


Jonathan Ellis <jb...@gmail.com> wrote on 04/02/2009 09:51:26 AM:

>
> On Thu, Apr 2, 2009 at 10:38 AM, Jun Rao <ju...@almaden.ibm.com> wrote:
> >
> > Johnathan,
> >
> > Could you explain a bit more how the pluggable partitioner (that
provides
> > String-> EndPoint mapping) works?
> >
> > In particular,
> > 1. How does the partitioner obtain the list of EndPoints in the system?
>
> Same way it does now, through Gossip.  The main change is from
>
> private Map<BigInteger, EndPoint> tokenToEndPointMap_
>
> to
>
> private Map<Token, EndPoint> tokenToEndPointMap_
>
> where Token can either be implemented as a wrapper around BigIntegeror
String.
>
> > 2. What happens when EndPoints are added/removed from the system? Does
the
> > partitioner have to remember the history of the EndPoints?
>
> Ditto.
>
> > 3. How does the partitioner coordinate with the load-balancer? Suppose
some
> > rows are moved from one EndPoint to another. How is that information
> > communicated and used by the partitioner so that routing remains
correct?
>
> Load balancing isn't complete yet; I didn't mean to imply otherwise.
> So I don't know for sure.  Avinash has probably thought about this
> more.  But having Token be abstract shouldn't affect the algorithm.
>
> -Jonathan

Re: OPHF

Posted by Jonathan Ellis <jb...@gmail.com>.

On Thu, Apr 2, 2009 at 10:38 AM, Jun Rao <ju...@almaden.ibm.com> wrote:
>
> Johnathan,
>
> Could you explain a bit more how the pluggable partitioner (that provides
> String-> EndPoint mapping) works?
>
> In particular,
> 1. How does the partitioner obtain the list of EndPoints in the system?

Same way it does now, through Gossip.  The main change is from

private Map<BigInteger, EndPoint> tokenToEndPointMap_

to

private Map<Token, EndPoint> tokenToEndPointMap_

where Token can either be implemented as a wrapper around BigInteger or String.

> 2. What happens when EndPoints are added/removed from the system? Does the
> partitioner have to remember the history of the EndPoints?

Ditto.

> 3. How does the partitioner coordinate with the load-balancer? Suppose some
> rows are moved from one EndPoint to another. How is that information
> communicated and used by the partitioner so that routing remains correct?

Load balancing isn't complete yet; I didn't mean to imply otherwise.
So I don't know for sure.  Avinash has probably thought about this
more.  But having Token be abstract shouldn't affect the algorithm.

-Jonathan

Re: OPHF

Posted by Jun Rao <ju...@almaden.ibm.com>.

Johnathan,

Could you explain a bit more how the pluggable partitioner (that provides
String-> EndPoint mapping) works?

In particular,
1. How does the partitioner obtain the list of EndPoints in the system?
2. What happens when EndPoints are added/removed from the system? Does the
partitioner have to remember the history of the EndPoints?
3. How does the partitioner coordinate with the load-balancer? Suppose some
rows are moved from one EndPoint to another. How is that information
communicated and used by the partitioner so that routing remains correct?

The answers may be found from your code. However, it will be helpful if you
can provide a high-level description here.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com


Jonathan Ellis <jb...@gmail.com> wrote on 04/01/2009 02:31:58 PM:

>
> Avinash already commited his new order-preserving hash function and I
> missed it.  It's in OrderPreservingHashPartitioner.  It takes the
> approach that Todd and I discussed back in January: turn the string
> into a base-Char.MAX_VALUE number.
> (http://groups.google.com/group/cassandra-
> dev/browse_thread/thread/6bda8518466210e7/f53b79c19010a9ed).
>  I chatted with Avinash a little on IM but we didn't finish, so I'm
> picking it up here.
>
> There are two problems with this approach.  One is that the hashes
> will only be order-preserving for a subset of unicode (all of UCS-2,
> but not all of UTF-16; see http://en.wikipedia.org/wiki/UTF-16/UCS-2).
>  The other is that this only gives you a naive ordering by code point
> value, which for unicode is not what you want and even for ascii
> sometimes you want another collation.  (see
> http://www.unicode.org/reports/tr10/ and
> http://java.sun.com/javase/6/docs/api/java/text/Collator.html).
>
> Say for instance you have inserted keys ['a', 'b', '--a', '--b'] and
> you are going to do range queries on them.  The correct
> collation-aware sort is ['a', '--a', 'b', '--b'].  But ordering by
> char value gives ['--a', '--b', 'a', 'b'].
>
> Switching to a more flexibile system like the one I wrote for
> CASSANDRA-3 will let use use Token<BigInteger> for random distribution
> or Token<String> for order-preserving, with user-defined collation.  I
> don't see a way to get this kind of flexibility from an approach that
> insists on turning everything into BigInteger.
>
> -Jonathan
>
> On Mon, Mar 30, 2009 at 2:16 PM, Jonathan Ellis <jb...@gmail.com>
wrote:
> > Avinash,
> >
> > You mentioned that you have a new order-preserving hash function that
> > you think will be more generally useful.  Can you post it?
> >
> > thanks,
> >
> > -Jonathan
> >