You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Drew Kutcharian <dr...@venarc.com> on 2011/12/20 04:01:00 UTC

Choosing a Partitioner Type for Random java.util.UUID Row Keys

Hey Guys,

I just came across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me thinking. If the row keys are java.util.UUID which are generated randomly (and securely), then what type of partitioner would be the best? Since the key values are already random, would it make a difference to use RandomPartitioner or one can use ByteOrderedPartitioner or OrderPreservingPartitioning as well and get the same result?

-- Drew

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by aaron morton <aa...@thelastpickle.com>.

No problems. 

IMHO you should develop a sizable bruise banging your head against a using Standard CF's and the Random Partitioner before using something else. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 23/12/2011, at 6:29 AM, Bryce Allen wrote:

> Thanks, that definitely has advantages over using a super column. We
> ran into thrift timeouts when the super column got large, and with the
> super column range query there is no way (AFAIK) to batch the request at
> the subcolumn level.
> 
> -Bryce
> 
> On Thu, 22 Dec 2011 10:06:58 +1300
> aaron morton <aa...@thelastpickle.com> wrote:
>> AFAIK there are no plans kill the BOP, but I would still try to make
>> your life easier by using the RP. . 
>> 
>> My understanding of the problem is at certain times you snapshot the
>> files in a dir; and the main query you want to handle is "At what
>> points between time t0 and time t1 did files x,y and z exist?".
>> 
>> You could consider:
>> 
>> 1) Partitioning the time series data in across each row, then make
>> the row key is the timestamp for the start of the partition. If you
>> have rollup partitions consider making the row key <timestamp :
>> partition_size> , e.g. <123456789."1d"> for a 1 day partition that
>> starts at 123456789 2) In each row use column names that have the
>> form <timestamp : file_name> where time stamp is the time of the
>> snapshot. 
>> 
>> To query between two times (t0 and t1):
>> 
>> 1) Determine which partitions the time span covers, this will give
>> you a list of rows. 2) Execute a multi-get slice for the all rows
>> using  <t0:*> and <t1:*> (I'm using * here as a null, check with your
>> client to see how to use composite columns.)
>> 
>> Hope that helps. 
>> Aaron
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 21/12/2011, at 9:03 AM, Bryce Allen wrote:
>> 
>>> I wasn't aware of CompositeColumns, thanks for the tip. However I
>>> think it still doesn't allow me to do the query I need - basically
>>> I need to do a timestamp range query, limiting only to certain file
>>> names at each timestamp. With BOP and a separate row for each
>>> timestamp, prefixed by a random UUID, and file names as column
>>> names, I can do this query. With CompositeColumns, I can only query
>>> one contiguous range, so I'd have to know the timestamps before
>>> hand to limit the file names. I can resolve this using indexes, but
>>> on paper it looks like this would be significantly slower (it would
>>> take me 5 round trips instead of 3 to complete each query, and the
>>> query is made multiple times on every single client request).
>>> 
>>> The two down sides I've seen listed for BOP are balancing issues and
>>> hotspots. I can understand why RP is recommended, from the balancing
>>> issues alone. However these aren't problems for my application. Is
>>> there anything else I am missing? Does the Cassandra team plan on
>>> continuing to support BOP? I haven't completely ruled out RP, but I
>>> like having BOP as an option, it opens up interesting modeling
>>> alternatives that I think have real advantages for some
>>> (if uncommon) applications.
>>> 
>>> Thanks,
>>> Bryce
>>> 
>>> On Wed, 21 Dec 2011 08:08:16 +1300
>>> aaron morton <aa...@thelastpickle.com> wrote:
>>>> Bryce, 
>>>> 	Have you considered using CompositeColumns and a standard
>>>> CF? Row key is the UUID column name is (timestamp : dir_entry) you
>>>> can then slice all columns with a particular time stamp. 
>>>> 
>>>> 	Even if you have a random key, I would use the RP unless
>>>> you have an extreme use case. 
>>>> 
>>>> Cheers
>>>> 
>>>> -----------------
>>>> Aaron Morton
>>>> Freelance Developer
>>>> @aaronmorton
>>>> http://www.thelastpickle.com
>>>> 
>>>> On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
>>>> 
>>>>> I think it comes down to how much you benefit from row range
>>>>> scans, and how confident you are that going forward all data will
>>>>> continue to use random row keys.
>>>>> 
>>>>> I'm considering using BOP as a way of working around the non
>>>>> indexes super column limitation. In my current schema, row keys
>>>>> are random UUIDs, super column names are timestamps, and columns
>>>>> contain a snapshot in time of directory contents, and could be
>>>>> quite large. If instead I use row keys that are
>>>>> (uuid)-(timestamp), and use a standard column family, I can do a
>>>>> row range query and select only specific columns. I'm still
>>>>> evaluating if I can do this with BOP - ideally the token would
>>>>> just use the first 128 bits of the key, and I haven't found any
>>>>> documentation on how it compares keys of different length.
>>>>> 
>>>>> Another trick with BOP is to use MD5(rowkey)-rowkey for data that
>>>>> has non uniform row keys. I think it's reasonable to use if most
>>>>> data is uniform and benefits from range scans, but a few things
>>>>> are added that aren't/don't. This trick does make the keys larger,
>>>>> which increases storage cost and IO load, so it's probably a bad
>>>>> idea if a significant subset of the data requires it.
>>>>> 
>>>>> Disclaimer - I wrote that wiki article to fill in a documentation
>>>>> gap, since there were no examples of BOP and I wasted a lot of
>>>>> time before I noticed the hex byte array vs decimal distinction
>>>>> for specifying the initial tokens (which to be fair is
>>>>> documented, just easy to miss on a skim). I'm also new to
>>>>> cassandra, I'm just describing what makes sense to me "on paper".
>>>>> FWIW I confirmed that random UUIDs (type 4) row keys really do
>>>>> evenly distribute when using BOP.
>>>>> 
>>>>> -Bryce
>>>>> 
>>>>> On Mon, 19 Dec 2011 19:01:00 -0800
>>>>> Drew Kutcharian <dr...@venarc.com> wrote:
>>>>>> Hey Guys,
>>>>>> 
>>>>>> I just came across
>>>>>> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it
>>>>>> got me thinking. If the row keys are java.util.UUID which are
>>>>>> generated randomly (and securely), then what type of partitioner
>>>>>> would be the best? Since the key values are already random,
>>>>>> would it make a difference to use RandomPartitioner or one can
>>>>>> use ByteOrderedPartitioner or OrderPreservingPartitioning as
>>>>>> well and get the same result?
>>>>>> 
>>>>>> -- Drew
>>>>>> 
>>>> 
>>

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

Thanks, that definitely has advantages over using a super column. We
ran into thrift timeouts when the super column got large, and with the
super column range query there is no way (AFAIK) to batch the request at
the subcolumn level.

-Bryce

On Thu, 22 Dec 2011 10:06:58 +1300
aaron morton <aa...@thelastpickle.com> wrote:
> AFAIK there are no plans kill the BOP, but I would still try to make
> your life easier by using the RP. . 
> 
> My understanding of the problem is at certain times you snapshot the
> files in a dir; and the main query you want to handle is "At what
> points between time t0 and time t1 did files x,y and z exist?".
> 
> You could consider:
> 
> 1) Partitioning the time series data in across each row, then make
> the row key is the timestamp for the start of the partition. If you
> have rollup partitions consider making the row key <timestamp :
> partition_size> , e.g. <123456789."1d"> for a 1 day partition that
> starts at 123456789 2) In each row use column names that have the
> form <timestamp : file_name> where time stamp is the time of the
> snapshot. 
> 
> To query between two times (t0 and t1):
> 
> 1) Determine which partitions the time span covers, this will give
> you a list of rows. 2) Execute a multi-get slice for the all rows
> using  <t0:*> and <t1:*> (I'm using * here as a null, check with your
> client to see how to use composite columns.)
> 
> Hope that helps. 
> Aaron
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 21/12/2011, at 9:03 AM, Bryce Allen wrote:
> 
> > I wasn't aware of CompositeColumns, thanks for the tip. However I
> > think it still doesn't allow me to do the query I need - basically
> > I need to do a timestamp range query, limiting only to certain file
> > names at each timestamp. With BOP and a separate row for each
> > timestamp, prefixed by a random UUID, and file names as column
> > names, I can do this query. With CompositeColumns, I can only query
> > one contiguous range, so I'd have to know the timestamps before
> > hand to limit the file names. I can resolve this using indexes, but
> > on paper it looks like this would be significantly slower (it would
> > take me 5 round trips instead of 3 to complete each query, and the
> > query is made multiple times on every single client request).
> > 
> > The two down sides I've seen listed for BOP are balancing issues and
> > hotspots. I can understand why RP is recommended, from the balancing
> > issues alone. However these aren't problems for my application. Is
> > there anything else I am missing? Does the Cassandra team plan on
> > continuing to support BOP? I haven't completely ruled out RP, but I
> > like having BOP as an option, it opens up interesting modeling
> > alternatives that I think have real advantages for some
> > (if uncommon) applications.
> > 
> > Thanks,
> > Bryce
> > 
> > On Wed, 21 Dec 2011 08:08:16 +1300
> > aaron morton <aa...@thelastpickle.com> wrote:
> >> Bryce, 
> >> 	Have you considered using CompositeColumns and a standard
> >> CF? Row key is the UUID column name is (timestamp : dir_entry) you
> >> can then slice all columns with a particular time stamp. 
> >> 
> >> 	Even if you have a random key, I would use the RP unless
> >> you have an extreme use case. 
> >> 
> >> Cheers
> >> 
> >> -----------------
> >> Aaron Morton
> >> Freelance Developer
> >> @aaronmorton
> >> http://www.thelastpickle.com
> >> 
> >> On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
> >> 
> >>> I think it comes down to how much you benefit from row range
> >>> scans, and how confident you are that going forward all data will
> >>> continue to use random row keys.
> >>> 
> >>> I'm considering using BOP as a way of working around the non
> >>> indexes super column limitation. In my current schema, row keys
> >>> are random UUIDs, super column names are timestamps, and columns
> >>> contain a snapshot in time of directory contents, and could be
> >>> quite large. If instead I use row keys that are
> >>> (uuid)-(timestamp), and use a standard column family, I can do a
> >>> row range query and select only specific columns. I'm still
> >>> evaluating if I can do this with BOP - ideally the token would
> >>> just use the first 128 bits of the key, and I haven't found any
> >>> documentation on how it compares keys of different length.
> >>> 
> >>> Another trick with BOP is to use MD5(rowkey)-rowkey for data that
> >>> has non uniform row keys. I think it's reasonable to use if most
> >>> data is uniform and benefits from range scans, but a few things
> >>> are added that aren't/don't. This trick does make the keys larger,
> >>> which increases storage cost and IO load, so it's probably a bad
> >>> idea if a significant subset of the data requires it.
> >>> 
> >>> Disclaimer - I wrote that wiki article to fill in a documentation
> >>> gap, since there were no examples of BOP and I wasted a lot of
> >>> time before I noticed the hex byte array vs decimal distinction
> >>> for specifying the initial tokens (which to be fair is
> >>> documented, just easy to miss on a skim). I'm also new to
> >>> cassandra, I'm just describing what makes sense to me "on paper".
> >>> FWIW I confirmed that random UUIDs (type 4) row keys really do
> >>> evenly distribute when using BOP.
> >>> 
> >>> -Bryce
> >>> 
> >>> On Mon, 19 Dec 2011 19:01:00 -0800
> >>> Drew Kutcharian <dr...@venarc.com> wrote:
> >>>> Hey Guys,
> >>>> 
> >>>> I just came across
> >>>> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it
> >>>> got me thinking. If the row keys are java.util.UUID which are
> >>>> generated randomly (and securely), then what type of partitioner
> >>>> would be the best? Since the key values are already random,
> >>>> would it make a difference to use RandomPartitioner or one can
> >>>> use ByteOrderedPartitioner or OrderPreservingPartitioning as
> >>>> well and get the same result?
> >>>> 
> >>>> -- Drew
> >>>> 
> >> 
>

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by aaron morton <aa...@thelastpickle.com>.

AFAIK there are no plans kill the BOP, but I would still try to make your life easier by using the RP. . 

My understanding of the problem is at certain times you snapshot the files in a dir; and the main query you want to handle is "At what points between time t0 and time t1 did files x,y and z exist?".

You could consider:

1) Partitioning the time series data in across each row, then make the row key is the timestamp for the start of the partition. If you have rollup partitions consider making the row key <timestamp : partition_size> , e.g. <123456789."1d"> for a 1 day partition that starts at 123456789
2) In each row use column names that have the form <timestamp : file_name> where time stamp is the time of the snapshot. 

To query between two times (t0 and t1):

1) Determine which partitions the time span covers, this will give you a list of rows. 
2) Execute a multi-get slice for the all rows using  <t0:*> and <t1:*> (I'm using * here as a null, check with your client to see how to use composite columns.)

Hope that helps. 
Aaron


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 9:03 AM, Bryce Allen wrote:

> I wasn't aware of CompositeColumns, thanks for the tip. However I think
> it still doesn't allow me to do the query I need - basically I need to
> do a timestamp range query, limiting only to certain file names at
> each timestamp. With BOP and a separate row for each timestamp,
> prefixed by a random UUID, and file names as column names, I can do this
> query. With CompositeColumns, I can only query one contiguous range, so
> I'd have to know the timestamps before hand to limit the file names. I
> can resolve this using indexes, but on paper it looks like this would be
> significantly slower (it would take me 5 round trips instead of 3 to
> complete each query, and the query is made multiple times on every
> single client request).
> 
> The two down sides I've seen listed for BOP are balancing issues and
> hotspots. I can understand why RP is recommended, from the balancing
> issues alone. However these aren't problems for my application. Is
> there anything else I am missing? Does the Cassandra team plan on
> continuing to support BOP? I haven't completely ruled out RP, but I
> like having BOP as an option, it opens up interesting modeling
> alternatives that I think have real advantages for some
> (if uncommon) applications.
> 
> Thanks,
> Bryce
> 
> On Wed, 21 Dec 2011 08:08:16 +1300
> aaron morton <aa...@thelastpickle.com> wrote:
>> Bryce, 
>> 	Have you considered using CompositeColumns and a standard CF?
>> Row key is the UUID column name is (timestamp : dir_entry) you can
>> then slice all columns with a particular time stamp. 
>> 
>> 	Even if you have a random key, I would use the RP unless you
>> have an extreme use case. 
>> 
>> Cheers
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
>> 
>>> I think it comes down to how much you benefit from row range scans,
>>> and how confident you are that going forward all data will continue
>>> to use random row keys.
>>> 
>>> I'm considering using BOP as a way of working around the non indexes
>>> super column limitation. In my current schema, row keys are random
>>> UUIDs, super column names are timestamps, and columns contain a
>>> snapshot in time of directory contents, and could be quite large. If
>>> instead I use row keys that are (uuid)-(timestamp), and use a
>>> standard column family, I can do a row range query and select only
>>> specific columns. I'm still evaluating if I can do this with BOP -
>>> ideally the token would just use the first 128 bits of the key, and
>>> I haven't found any documentation on how it compares keys of
>>> different length.
>>> 
>>> Another trick with BOP is to use MD5(rowkey)-rowkey for data that
>>> has non uniform row keys. I think it's reasonable to use if most
>>> data is uniform and benefits from range scans, but a few things are
>>> added that aren't/don't. This trick does make the keys larger,
>>> which increases storage cost and IO load, so it's probably a bad
>>> idea if a significant subset of the data requires it.
>>> 
>>> Disclaimer - I wrote that wiki article to fill in a documentation
>>> gap, since there were no examples of BOP and I wasted a lot of time
>>> before I noticed the hex byte array vs decimal distinction for
>>> specifying the initial tokens (which to be fair is documented, just
>>> easy to miss on a skim). I'm also new to cassandra, I'm just
>>> describing what makes sense to me "on paper". FWIW I confirmed that
>>> random UUIDs (type 4) row keys really do evenly distribute when
>>> using BOP.
>>> 
>>> -Bryce
>>> 
>>> On Mon, 19 Dec 2011 19:01:00 -0800
>>> Drew Kutcharian <dr...@venarc.com> wrote:
>>>> Hey Guys,
>>>> 
>>>> I just came across
>>>> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got
>>>> me thinking. If the row keys are java.util.UUID which are generated
>>>> randomly (and securely), then what type of partitioner would be the
>>>> best? Since the key values are already random, would it make a
>>>> difference to use RandomPartitioner or one can use
>>>> ByteOrderedPartitioner or OrderPreservingPartitioning as well and
>>>> get the same result?
>>>> 
>>>> -- Drew
>>>> 
>>

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

I wasn't aware of CompositeColumns, thanks for the tip. However I think
it still doesn't allow me to do the query I need - basically I need to
do a timestamp range query, limiting only to certain file names at
each timestamp. With BOP and a separate row for each timestamp,
prefixed by a random UUID, and file names as column names, I can do this
query. With CompositeColumns, I can only query one contiguous range, so
I'd have to know the timestamps before hand to limit the file names. I
can resolve this using indexes, but on paper it looks like this would be
significantly slower (it would take me 5 round trips instead of 3 to
complete each query, and the query is made multiple times on every
single client request).

The two down sides I've seen listed for BOP are balancing issues and
hotspots. I can understand why RP is recommended, from the balancing
issues alone. However these aren't problems for my application. Is
there anything else I am missing? Does the Cassandra team plan on
continuing to support BOP? I haven't completely ruled out RP, but I
like having BOP as an option, it opens up interesting modeling
alternatives that I think have real advantages for some
(if uncommon) applications.

Thanks,
Bryce

On Wed, 21 Dec 2011 08:08:16 +1300
aaron morton <aa...@thelastpickle.com> wrote:
> Bryce, 
> 	Have you considered using CompositeColumns and a standard CF?
> Row key is the UUID column name is (timestamp : dir_entry) you can
> then slice all columns with a particular time stamp. 
> 
> 	Even if you have a random key, I would use the RP unless you
> have an extreme use case. 
> 
>  Cheers
> 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 21/12/2011, at 3:06 AM, Bryce Allen wrote:
> 
> > I think it comes down to how much you benefit from row range scans,
> > and how confident you are that going forward all data will continue
> > to use random row keys.
> > 
> > I'm considering using BOP as a way of working around the non indexes
> > super column limitation. In my current schema, row keys are random
> > UUIDs, super column names are timestamps, and columns contain a
> > snapshot in time of directory contents, and could be quite large. If
> > instead I use row keys that are (uuid)-(timestamp), and use a
> > standard column family, I can do a row range query and select only
> > specific columns. I'm still evaluating if I can do this with BOP -
> > ideally the token would just use the first 128 bits of the key, and
> > I haven't found any documentation on how it compares keys of
> > different length.
> > 
> > Another trick with BOP is to use MD5(rowkey)-rowkey for data that
> > has non uniform row keys. I think it's reasonable to use if most
> > data is uniform and benefits from range scans, but a few things are
> > added that aren't/don't. This trick does make the keys larger,
> > which increases storage cost and IO load, so it's probably a bad
> > idea if a significant subset of the data requires it.
> > 
> > Disclaimer - I wrote that wiki article to fill in a documentation
> > gap, since there were no examples of BOP and I wasted a lot of time
> > before I noticed the hex byte array vs decimal distinction for
> > specifying the initial tokens (which to be fair is documented, just
> > easy to miss on a skim). I'm also new to cassandra, I'm just
> > describing what makes sense to me "on paper". FWIW I confirmed that
> > random UUIDs (type 4) row keys really do evenly distribute when
> > using BOP.
> > 
> > -Bryce
> > 
> > On Mon, 19 Dec 2011 19:01:00 -0800
> > Drew Kutcharian <dr...@venarc.com> wrote:
> >> Hey Guys,
> >> 
> >> I just came across
> >> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got
> >> me thinking. If the row keys are java.util.UUID which are generated
> >> randomly (and securely), then what type of partitioner would be the
> >> best? Since the key values are already random, would it make a
> >> difference to use RandomPartitioner or one can use
> >> ByteOrderedPartitioner or OrderPreservingPartitioning as well and
> >> get the same result?
> >> 
> >> -- Drew
> >> 
>

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by aaron morton <aa...@thelastpickle.com>.

Bryce, 
	Have you considered using CompositeColumns and a standard CF? Row key is the UUID column name is (timestamp : dir_entry) you can then slice all columns with a particular time stamp. 

	Even if you have a random key, I would use the RP unless you have an extreme use case. 

 Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 21/12/2011, at 3:06 AM, Bryce Allen wrote:

> I think it comes down to how much you benefit from row range scans, and
> how confident you are that going forward all data will continue to use
> random row keys.
> 
> I'm considering using BOP as a way of working around the non indexes
> super column limitation. In my current schema, row keys are random
> UUIDs, super column names are timestamps, and columns contain a
> snapshot in time of directory contents, and could be quite large. If
> instead I use row keys that are (uuid)-(timestamp), and use a standard
> column family, I can do a row range query and select only specific
> columns. I'm still evaluating if I can do this with BOP - ideally the
> token would just use the first 128 bits of the key, and I haven't found
> any documentation on how it compares keys of different length.
> 
> Another trick with BOP is to use MD5(rowkey)-rowkey for data that has
> non uniform row keys. I think it's reasonable to use if most data is
> uniform and benefits from range scans, but a few things are added that
> aren't/don't. This trick does make the keys larger, which increases
> storage cost and IO load, so it's probably a bad idea if a significant
> subset of the data requires it.
> 
> Disclaimer - I wrote that wiki article to fill in a documentation gap,
> since there were no examples of BOP and I wasted a lot of time before I
> noticed the hex byte array vs decimal distinction for specifying the
> initial tokens (which to be fair is documented, just easy to miss on a
> skim). I'm also new to cassandra, I'm just describing what makes sense
> to me "on paper". FWIW I confirmed that random UUIDs (type 4) row keys
> really do evenly distribute when using BOP.
> 
> -Bryce
> 
> On Mon, 19 Dec 2011 19:01:00 -0800
> Drew Kutcharian <dr...@venarc.com> wrote:
>> Hey Guys,
>> 
>> I just came across
>> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
>> thinking. If the row keys are java.util.UUID which are generated
>> randomly (and securely), then what type of partitioner would be the
>> best? Since the key values are already random, would it make a
>> difference to use RandomPartitioner or one can use
>> ByteOrderedPartitioner or OrderPreservingPartitioning as well and get
>> the same result?
>> 
>> -- Drew
>>

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by Bryce Allen <ba...@ci.uchicago.edu>.

I think it comes down to how much you benefit from row range scans, and
how confident you are that going forward all data will continue to use
random row keys.

I'm considering using BOP as a way of working around the non indexes
super column limitation. In my current schema, row keys are random
UUIDs, super column names are timestamps, and columns contain a
snapshot in time of directory contents, and could be quite large. If
instead I use row keys that are (uuid)-(timestamp), and use a standard
column family, I can do a row range query and select only specific
columns. I'm still evaluating if I can do this with BOP - ideally the
token would just use the first 128 bits of the key, and I haven't found
any documentation on how it compares keys of different length.

Another trick with BOP is to use MD5(rowkey)-rowkey for data that has
non uniform row keys. I think it's reasonable to use if most data is
uniform and benefits from range scans, but a few things are added that
aren't/don't. This trick does make the keys larger, which increases
storage cost and IO load, so it's probably a bad idea if a significant
subset of the data requires it.

Disclaimer - I wrote that wiki article to fill in a documentation gap,
since there were no examples of BOP and I wasted a lot of time before I
noticed the hex byte array vs decimal distinction for specifying the
initial tokens (which to be fair is documented, just easy to miss on a
skim). I'm also new to cassandra, I'm just describing what makes sense
to me "on paper". FWIW I confirmed that random UUIDs (type 4) row keys
really do evenly distribute when using BOP.

-Bryce

On Mon, 19 Dec 2011 19:01:00 -0800
Drew Kutcharian <dr...@venarc.com> wrote:
> Hey Guys,
> 
> I just came across
> http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
> thinking. If the row keys are java.util.UUID which are generated
> randomly (and securely), then what type of partitioner would be the
> best? Since the key values are already random, would it make a
> difference to use RandomPartitioner or one can use
> ByteOrderedPartitioner or OrderPreservingPartitioning as well and get
> the same result?
> 
> -- Drew
>

Re: Choosing a Partitioner Type for Random java.util.UUID Row Keys

Posted by Filipe Gonçalves <th...@gmail.com>.

Generally, RandomPartitioner is the recommended one.
If you already provide randomized keys it doesn't make much of a
difference, the nodes should be balanced with any partitioner.
However, unless you have UUID in all keys of all column families
(highly unlikely) ByteOrderedPartitioner and
OrderPreservingPartitioning will lead to hotspots and unbalanced
rings.

2011/12/20 Drew Kutcharian <dr...@venarc.com>:
> Hey Guys,
>
> I just came
> across http://wiki.apache.org/cassandra/ByteOrderedPartitioner and it got me
> thinking. If the row keys are java.util.UUID which are generated randomly
> (and securely), then what type of partitioner would be the best? Since the
> key values are already random, would it make a difference to use
> RandomPartitioner or one can use ByteOrderedPartitioner or
> OrderPreservingPartitioning as well and get the same result?
>
> -- Drew
>



-- 
Filipe Gonçalves