You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by S Ahmed <sa...@gmail.com> on 2010/07/14 22:44:56 UTC

key types and grouping related rows together

Where is the link that describes the various key types and their impact on
sorting? (I believe I read it before, can't seem to find it now).

So my application supports multi-tenants, so I need the keys to represent
things like:

website1123 + contentID

or

website3454 + userID

And for range queries, these keys have to be grouped together obviously.

What key type would be best suited for this?


I might have to create a CF that maps the website and its key prefix?

Re: key types and grouping related rows together

Posted by Aaron Morton <aa...@thelastpickle.com>.
yes, you need to maintain the secondary index your self. Send a batch_mutation and write the article and website article colums at the same time.

I think your safe up to a large number of cols, say 1M. Not sure, may try to track the info down one day.

A

On 16 Jul, 2010,at 03:39 PM, S Ahmed <sa...@gmail.com> wrote:

>
>
> So am I to keep track on the # of columns for a given key in CF WebsiteArticle?  i.e if I want to do a get_slice for the first 10 OR last 10 (I would need to know the count to get the last 10).
>
> >>Am assuming RP. There are some recommendations on the number of cols per key, in the millions I think I can never >>find it when I want it. 
> So I would have to potentially split the columns to another key then correct? i.e. website_id1, website_id1-2
>
>
> On Thu, Jul 15, 2010 at 8:17 PM, Aaron Morton <aa...@thelastpickle.com> wrote:
>
>     You could build a secondary index, e.g.
>
>     CF
>     Articles : {
>     article_id1 : {}
>     article_id2 : {}
>     }
>
>     CF
>     WebsiteArticle : {
>     website_id1 : { time_uuid : article_id1, time_uuid2 : article_id2}
>     }
>
>     when you want to get the last 10 for a website, get_slice from the WebsiteArticle CF then multi get from Articles.
>
>     Am assuming RP. There are some recommendations on the number of cols per key, in the millions I think I can never find it when I want it.
>
>     You could try a key of "webstie_id.timestamp" and try a get range using the Random Partitioner. The results will be unordered, but thats OK so long as you get the ones you want.
>
>     Aaron
>
>
>
>     On 16 Jul, 2010,at 09:08 AM, S Ahmed <sa...@gmail.com> wrote:
>
>>     Given a CF like:
>>
>>     Articles : {
>>      
>>        key1 : { title:"some title", body: "this is my article body...", .... },
>>        key1 : { title:"some title", body: "this is my article body...", .... }
>>     }
>>
>>     Now these articles could be for different websites eg. www.website1.com, www.website2.com
>>
>>     If I want to get the latest 10 articles for a given website, how would I formulate my key to achieve this?
>>
>>     I basically need to understand how to handle multi-tenancy, b/c I will need to do this for almost all my CF's. 
>>
>>     I'm a little stuck here so guidance would be great!
>>
>>
>>     On Thu, Jul 15, 2010 at 4:01 PM, S Ahmed <sa...@gmail.com> wrote:
>>
>>         Benjamin,
>>
>>         Ah, thanks for clarifying that
>>
>>         key sorting is changing in .7 I believe to support a binary array?
>>
>>
>>         On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>>             Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
>>             applies to _columns_ within rows, _not_ to row keys.
>>
>>
>>             On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed <sa...@gmail.com> wrote:
>>             > Where is the link that describes the various key types and their impact on
>>             > sorting? (I believe I read it before, can't seem to find it now).
>>             > So my application supports multi-tenants, so I need the keys to represent
>>             > things like:
>>             > website1123 + contentID
>>             > or
>>             > website3454 + userID
>>             > And for range queries, these keys have to be grouped together obviously.
>>             > What key type would be best suited for this?
>>             >
>>             >
>>             > I might have to create a CF that maps the website and its key prefix?
>>
>>
>>
>

Re: key types and grouping related rows together

Posted by S Ahmed <sa...@gmail.com>.
So am I to keep track on the # of columns for a given key in CF
WebsiteArticle?  i.e. if I want to do a get_slice for the first 10 OR last
10 (I would need to know the count to get the last 10).

>>Am assuming RP. There are some recommendations on the number of cols per
key, in the millions I think I can never >>find it when I want it.
So I would have to potentially split the columns to another key then
correct? i.e. website_id1, website_id1-2


On Thu, Jul 15, 2010 at 8:17 PM, Aaron Morton <aa...@thelastpickle.com>wrote:

> You could build a secondary index, e.g.
>
> CF
> Articles : {
> article_id1 : {}
> article_id2 : {}
> }
>
> CF
> WebsiteArticle : {
> website_id1 : { time_uuid : article_id1, time_uuid2 : article_id2}
> }
>
> when you want to get the last 10 for a website, get_slice from the
> WebsiteArticle CF then multi get from Articles.
>
> Am assuming RP. There are some recommendations on the number of cols per
> key, in the millions I think I can never find it when I want it.
>
> You could try a key of "webstie_id.timestamp" and try a get range using the
> Random Partitioner. The results will be unordered, but thats OK so long as
> you get the ones you want.
>
> Aaron
>
>
> On 16 Jul, 2010,at 09:08 AM, S Ahmed <sa...@gmail.com> wrote:
>
> Given a CF like:
>
> Articles : {
>
>    key1 : { title:"some title", body: "this is my article body...", .... },
>    key1 : { title:"some title", body: "this is my article body...", .... }
> }
>
> Now these articles could be for different websites e.g. www.website1.com,
> www.website2.com
>
> If I want to get the latest 10 articles for a given website, how would I
> formulate my key to achieve this?
>
> I basically need to understand how to handle multi-tenancy, b/c I will need
> to do this for almost all my CF's.
>
> I'm a little stuck here so guidance would be great!
>
>
> On Thu, Jul 15, 2010 at 4:01 PM, S Ahmed <sa...@gmail.com> wrote:
>
>> Benjamin,
>>
>> Ah, thanks for clarifying that.
>>
>> key sorting is changing in .7 I believe to support a binary array?
>>
>>
>> On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black <b...@b3k.us> wrote:
>>
>>> Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
>>> applies to _columns_ within rows, _not_ to row keys.
>>>
>>>
>>> On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed <sa...@gmail.com> wrote:
>>> > Where is the link that describes the various key types and their impact
>>> on
>>> > sorting? (I believe I read it before, can't seem to find it now).
>>> > So my application supports multi-tenants, so I need the keys to
>>> represent
>>> > things like:
>>> > website1123 + contentID
>>> > or
>>> > website3454 + userID
>>> > And for range queries, these keys have to be grouped together
>>> obviously.
>>> > What key type would be best suited for this?
>>> >
>>> >
>>> > I might have to create a CF that maps the website and its key prefix?
>>>
>>
>>
>

Re: key types and grouping related rows together

Posted by Aaron Morton <aa...@thelastpickle.com>.
You could build a secondary index, e.g.

CF
Articles : {
article_id1 : {}
article_id2 : {}
}

CF
WebsiteArticle : {
website_id1 : { time_uuid : article_id1, time_uuid2 : article_id2}
}

when you want to get the last 10 for a website, get_slice from the WebsiteArticle CF then multi get from Articles.

Am assuming RP. There are some recommendations on the number of cols per key, in the millions I think I can never find it when I want it.

You could try a key of "webstie_id.timestamp" and try a get range using the Random Partitioner. The results will be unordered, but thats OK so long as you get the ones you want.

Aaron


On 16 Jul, 2010,at 09:08 AM, S Ahmed <sa...@gmail.com> wrote:

> Given a CF like:
>
> Articles : {
>  
>    key1 : { title:"some title", body: "this is my article body...", .... },
>    key1 : { title:"some title", body: "this is my article body...", .... }
> }
>
> Now these articles could be for different websites e.g. www.website1.com, www.website2.com
>
> If I want to get the latest 10 articles for a given website, how would I formulate my key to achieve this?
>
> I basically need to understand how to handle multi-tenancy, b/c I will need to do this for almost all my CF's. 
>
> I'm a little stuck here so guidance would be great!
>
>
> On Thu, Jul 15, 2010 at 4:01 PM, S Ahmed <sa...@gmail.com> wrote:
>
>     Benjamin,
>
>     Ah, thanks for clarifying that.
>
>     key sorting is changing in .7 I believe to support a binary array?
>
>
>     On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black <b...@b3k.us> wrote:
>
>         Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
>         applies to _columns_ within rows, _not_ to row keys.
>
>
>         On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed <sa...@gmail.com> wrote:
>         > Where is the link that describes the various key types and their impact on
>         > sorting? (I believe I read it before, can't seem to find it now)
>         > So my application supports multi-tenants, so I need the keys to represent
>         > things like:
>         > website1123 + contentID
>         > or
>         > website3454 + userID
>         > And for range queries, these keys have to be grouped together obviously.
>         > What key type would be best suited for this?
>         >
>         >
>         > I might have to create a CF that maps the website and its key prefix?
>
>
>

Re: key types and grouping related rows together

Posted by S Ahmed <sa...@gmail.com>.
Given a CF like:

Articles : {

   key1 : { title:"some title", body: "this is my article body...", .... },
   key1 : { title:"some title", body: "this is my article body...", .... }
}

Now these articles could be for different websites e.g. www.website1.com,
www.website2.com

If I want to get the latest 10 articles for a given website, how would I
formulate my key to achieve this?

I basically need to understand how to handle multi-tenancy, b/c I will need
to do this for almost all my CF's.

I'm a little stuck here so guidance would be great!


On Thu, Jul 15, 2010 at 4:01 PM, S Ahmed <sa...@gmail.com> wrote:

> Benjamin,
>
> Ah, thanks for clarifying that.
>
> key sorting is changing in .7 I believe to support a binary array?
>
> On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black <b...@b3k.us> wrote:
>
>> Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
>> applies to _columns_ within rows, _not_ to row keys.
>>
>> On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed <sa...@gmail.com> wrote:
>> > Where is the link that describes the various key types and their impact
>> on
>> > sorting? (I believe I read it before, can't seem to find it now).
>> > So my application supports multi-tenants, so I need the keys to
>> represent
>> > things like:
>> > website1123 + contentID
>> > or
>> > website3454 + userID
>> > And for range queries, these keys have to be grouped together obviously.
>> > What key type would be best suited for this?
>> >
>> >
>> > I might have to create a CF that maps the website and its key prefix?
>>
>
>

Re: key types and grouping related rows together

Posted by S Ahmed <sa...@gmail.com>.
Benjamin,

Ah, thanks for clarifying that.

key sorting is changing in .7 I believe to support a binary array?

On Thu, Jul 15, 2010 at 3:26 PM, Benjamin Black <b...@b3k.us> wrote:

> Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
> applies to _columns_ within rows, _not_ to row keys.
>
> On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed <sa...@gmail.com> wrote:
> > Where is the link that describes the various key types and their impact
> on
> > sorting? (I believe I read it before, can't seem to find it now).
> > So my application supports multi-tenants, so I need the keys to represent
> > things like:
> > website1123 + contentID
> > or
> > website3454 + userID
> > And for range queries, these keys have to be grouped together obviously.
> > What key type would be best suited for this?
> >
> >
> > I might have to create a CF that maps the website and its key prefix?
>

Re: key types and grouping related rows together

Posted by Benjamin Black <b...@b3k.us>.
Keys are always sorted (in 0.6) as UTF8 strings.  The CompareWith
applies to _columns_ within rows, _not_ to row keys.

On Wed, Jul 14, 2010 at 1:44 PM, S Ahmed <sa...@gmail.com> wrote:
> Where is the link that describes the various key types and their impact on
> sorting? (I believe I read it before, can't seem to find it now).
> So my application supports multi-tenants, so I need the keys to represent
> things like:
> website1123 + contentID
> or
> website3454 + userID
> And for range queries, these keys have to be grouped together obviously.
> What key type would be best suited for this?
>
>
> I might have to create a CF that maps the website and its key prefix?

Re: key types and grouping related rows together

Posted by S Ahmed <sa...@gmail.com>.
Do you think a composite key using a key type of Bytes would work?

How many bytes can it be?


public static byte [] createRowKey(int websiteid, long stamp)
throws Exception {
  byte [] websiteidBytes = Bytes.toBytes(websiteid);
  byte [] stampBytes = Bytes.toBytes(stamp);
  return Bytes.add(websiteidBytes, stampBytes);
}

So say this key is used in a ColumnFamily that stores Articles for all
websites, using a key like this would allow me to get a range of
articles written, ordered by date, for a specific website correct?



On Thu, Jul 15, 2010 at 9:38 AM, S Ahmed <sa...@gmail.com> wrote:

> Well I'm not talking about a specific column family here, as ALL my column
> families will have content that is specific to a certain website, so I need
> a strategy that I will use on almost all my column families.
>
>
> On Wed, Jul 14, 2010 at 9:20 PM, Schubert Zhang <zs...@gmail.com> wrote:
>
>> for your apps, how about this schema:
>>
>> key: website1123
>> columnName: UserID
>> ...
>>
>>
>> On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton <aa...@thelastpickle.com>wrote:
>>
>>> The key structure you have should group the keys based on the website
>>> There are some differences between range queries with RP and OPP this
>>> article may help
>>>
>>> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>>>
>>> Aaron
>>>
>>>
>>> On 15 Jul, 2010,at 08:44 AM, S Ahmed <sa...@gmail.com> wrote:
>>>
>>> Where is the link that describes the various key types and their impact
>>> on sorting? (I believe I read it before, can't seem to find it now).
>>>
>>> So my application supports multi-tenants, so I need the keys to represent
>>> things like:
>>>
>>> website1123 + contentID
>>>
>>> or
>>>
>>> website3454 + userID
>>>
>>> And for range queries, these keys have to be grouped together obviously.
>>>
>>> What key type would be best suited for this?
>>>
>>>
>>> I might have to create a CF that maps the website and its key prefix?
>>>
>>>
>>
>

Re: key types and grouping related rows together

Posted by S Ahmed <sa...@gmail.com>.
Well I'm not talking about a specific column family here, as ALL my column
families will have content that is specific to a certain website, so I need
a strategy that I will use on almost all my column families.

On Wed, Jul 14, 2010 at 9:20 PM, Schubert Zhang <zs...@gmail.com> wrote:

> for your apps, how about this schema:
>
> key: website1123
> columnName: UserID
> ...
>
>
> On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton <aa...@thelastpickle.com>wrote:
>
>> The key structure you have should group the keys based on the website
>> There are some differences between range queries with RP and OPP this
>> article may help
>>
>> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>>
>> Aaron
>>
>>
>> On 15 Jul, 2010,at 08:44 AM, S Ahmed <sa...@gmail.com> wrote:
>>
>> Where is the link that describes the various key types and their impact on
>> sorting? (I believe I read it before, can't seem to find it now).
>>
>> So my application supports multi-tenants, so I need the keys to represent
>> things like:
>>
>> website1123 + contentID
>>
>> or
>>
>> website3454 + userID
>>
>> And for range queries, these keys have to be grouped together obviously.
>>
>> What key type would be best suited for this?
>>
>>
>> I might have to create a CF that maps the website and its key prefix?
>>
>>
>

SV: key types and grouping related rows together

Posted by Thorvaldsson Justus <ju...@svenskaspel.se>.
Dont forget you can make your own sorting algorithm.
Here is a nice tutorial for that.
http://www.sodeso.nl/?p=421

Justus

Från: Schubert Zhang [mailto:zsongbo@gmail.com]
Skickat: den 15 juli 2010 04:20
Till: user@cassandra.apache.org
Ämne: Re: key types and grouping related rows together

for your apps, how about this schema:

key: website1123
columnName: UserID
...
On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton <aa...@thelastpickle.com>> wrote:
The key structure you have should group the keys based on the website There are some differences between range queries with RP and OPP this article may help
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

Aaron

On 15 Jul, 2010,at 08:44 AM, S Ahmed <sa...@gmail.com>> wrote:
Where is the link that describes the various key types and their impact on sorting? (I believe I read it before, can't seem to find it now).

So my application supports multi-tenants, so I need the keys to represent things like:

website1123 + contentID

or

website3454 + userID

And for range queries, these keys have to be grouped together obviously.

What key type would be best suited for this?


I might have to create a CF that maps the website and its key prefix?


Re: key types and grouping related rows together

Posted by Schubert Zhang <zs...@gmail.com>.
for your apps, how about this schema:

key: website1123
columnName: UserID
...

On Thu, Jul 15, 2010 at 6:13 AM, Aaron Morton <aa...@thelastpickle.com>wrote:

> The key structure you have should group the keys based on the website There
> are some differences between range queries with RP and OPP this article may
> help
>
> http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/
>
> Aaron
>
>
> On 15 Jul, 2010,at 08:44 AM, S Ahmed <sa...@gmail.com> wrote:
>
> Where is the link that describes the various key types and their impact on
> sorting? (I believe I read it before, can't seem to find it now).
>
> So my application supports multi-tenants, so I need the keys to represent
> things like:
>
> website1123 + contentID
>
> or
>
> website3454 + userID
>
> And for range queries, these keys have to be grouped together obviously.
>
> What key type would be best suited for this?
>
>
> I might have to create a CF that maps the website and its key prefix?
>
>

Re: key types and grouping related rows together

Posted by Aaron Morton <aa...@thelastpickle.com>.
The key structure you have should group the keys based on the website. There are some differences between range queries with RP and OPP this article may help
http://ria101.wordpress.com/2010/02/22/cassandra-randompartitioner-vs-orderpreservingpartitioner/

Aaron


On 15 Jul, 2010,at 08:44 AM, S Ahmed <sa...@gmail.com> wrote:

> Where is the link that describes the various key types and their impact on sorting? (I believe I read it before, can't seem to find it now).
>
> So my application supports multi-tenants, so I need the keys to represent things like:
>
> website1123 + contentID
>
> or
>
> website3454 + userID
>
> And for range queries, these keys have to be grouped together obviously.
>
> What key type would be best suited for this?
>
>
> I might have to create a CF that maps the website and its key prefix?