You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Benjamin Waldher <lg...@laserbunny.net> on 2010/12/01 15:33:10 UTC

Sorted Integer -> UUID

I have a fairly simple problem that might require a complicated solution.

I need to store Integer -> UUID in a column family, and be able to query 
(and then paginate) the rows ordered by the integer in descending order. 
This is simple enough if no two rows have the same integer, as the 
integer could be a column name which can easily be sorted. However, in 
my scenario, two rows may have the same Integer value. As such, I would 
need to use the integer as the key in the column family. However, this 
means I must use OrderPreservingPartitioner, which is going to cause a 
huge load imbalance on one of my nodes.

How can I have a sorted set of rows of Integer -> UUID where the integer 
may exist many times?

Re: Sorted Integer -> UUID

Posted by Brandon Williams <dr...@gmail.com>.
On Wed, Dec 1, 2010 at 8:33 AM, Benjamin Waldher <lg...@laserbunny.net>wrote:

> How can I have a sorted set of rows of Integer -> UUID where the integer
> may exist many times?
>

If you need to repeat an integer as a column name, add entropy and use a
BytesType column.  Pack your integer in big endian format, and append
another random packed integer.  When you slice the columns, only unpack the
first 8 bytes.  You still get the desired sorting this way, though on
repeats it will sort by the random portion.

-Brandon

Re: Sorted Integer -> UUID

Posted by Aaron Morton <aa...@thelastpickle.com>.
Could you use a Super CF?

Super Col name is the Integer, and the Col Names are the UUID. Not sure what your col values are or your key. 
There are some limitations to Super CF but I do not think they would apply in this case http://wiki.apache.org/cassandra/CassandraLimitations

You can the slice the super col names (your integers) and get back the super col and all it's columns. 

Or you could also use a two CF solution...

Index CF where you integer is the column name, not sure what your key is. The column value is not important.  
Item CF where the row key is the Integer, col names are the UUID not sure what the col value is. 

Some things to consider...
- is there a natural grouping to your integers ? e.g. every day
- what is the column value ? Will this make for big rows?

Hope that helps. 
Aaron


On 02 Dec, 2010,at 04:56 AM, Daniel Lundin <dl...@eintr.org> wrote:

Unless I misunderstand the Q, composing the column names with the row
keys and merging the resulting would yield something useful.

keyA => (1, uuid), (2, uuid), (3, uid)
keyB => (1, uuid), (2, uuid), (3, uid)

Should be transformed into:

(1, keyA, uuid),
(1, keyB, uuid),
(2, keyA, uuid),
(2, keyB, uuid),
(3, keyA, uuid),
(3, keyB, uuid)

map + merge to the rescue.

On Wed, Dec 1, 2010 at 3:33 PM, Benjamin Waldher <lg...@laserbunny.net> wrote:
> I have a fairly simple problem that might require a complicated solution.
>
> I need to store Integer -> UUID in a column family, and be able to query
> (and then paginate) the rows ordered by the integer in descending order.
> This is simple enough if no two rows have the same integer, as the integer
> could be a column name which can easily be sorted. However, in my scenario,
> two rows may have the same Integer value. As such, I would need to use the
> integer as the key in the column family. However, this means I must use
> OrderPreservingPartitioner, which is going to cause a huge load imbalance on
> one of my nodes.
>
> How can I have a sorted set of rows of Integer -> UUID where the integer may
> exist many times?
>

Re: Sorted Integer -> UUID

Posted by Daniel Lundin <dl...@eintr.org>.
Unless I misunderstand the Q, composing the column names with the row
keys and merging the resulting would yield something useful.

keyA => (1, uuid), (2, uuid), (3, uid)
keyB => (1, uuid), (2, uuid), (3, uid)

Should be transformed into:

 (1, keyA, uuid),
 (1, keyB, uuid),
 (2, keyA, uuid),
 (2, keyB, uuid),
 (3, keyA, uuid),
 (3, keyB, uuid)

map + merge to the rescue.

On Wed, Dec 1, 2010 at 3:33 PM, Benjamin Waldher <lg...@laserbunny.net> wrote:
> I have a fairly simple problem that might require a complicated solution.
>
> I need to store Integer -> UUID in a column family, and be able to query
> (and then paginate) the rows ordered by the integer in descending order.
> This is simple enough if no two rows have the same integer, as the integer
> could be a column name which can easily be sorted. However, in my scenario,
> two rows may have the same Integer value. As such, I would need to use the
> integer as the key in the column family. However, this means I must use
> OrderPreservingPartitioner, which is going to cause a huge load imbalance on
> one of my nodes.
>
> How can I have a sorted set of rows of Integer -> UUID where the integer may
> exist many times?
>