You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ertio Lew <er...@gmail.com> on 2012/03/26 23:16:25 UTC

Schema advice/help

I need to store activities by each user, on 5 items types. I always want to
read last 10 activities on each item type, by a user (ie, total activities
to read at a time =50).

I am wanting to store these activities in a single row for each user so
that they can be retrieved in single row query, since I want to read all
the last 10 activities on each item.. I am thinking of creating composite
names appending "itemtype" : "activityId"(activityId is just timestamp
value) but then, I don't see about how to read the last 10 activities from
all itemtypes.

Any ideas about schema to do this better way ?

Re: Schema advice/help

Posted by Ertio Lew <er...@gmail.com>.
@R. Verlangen:
You are suggesting to keep a single row for all activities & read all the
columns from the row & then filter, right!?

If done that way (instead of keeping it in 5 rows) then I would need to
retrieve 100s-200s of columns from single row rather than just 50 columns
if I keep in 5 rows.. Which of these two would be better ? More columns
from single row OR less columns from multiple rows ?

On Tue, Mar 27, 2012 at 2:27 PM, R. Verlangen <ro...@us2.nl> wrote:

> You can just get a slice range with as start "userId:" and no end.
>
>
> 2012/3/27 Maciej Miklas <ma...@googlemail.com>
>
>> multiget would require Order Preserving Partitioner, and this can lead to
>> unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have small
>> cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com>wrote:
>>
>>> without the ability to do disjoint column slices, i would probably use 5
>>> different rows.
>>>
>>> userId:itemType -> activityId
>>>
>>> then it's a multiget slice of 10 items from each of your 5 rows.
>>>
>>>
>>> On 26/03/2012 22:16, Ertio Lew wrote:
>>>
>>>> I need to store activities by each user, on 5 items types. I always
>>>> want to read last 10 activities on each item type, by a user (ie, total
>>>> activities to read at a time =50).
>>>>
>>>> I am wanting to store these activities in a single row for each user so
>>>> that they can be retrieved in single row query, since I want to read all
>>>> the last 10 activities on each item.. I am thinking of creating composite
>>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>>> value) but then, I don't see about how to read the last 10 activities from
>>>> all itemtypes.
>>>>
>>>> Any ideas about schema to do this better way ?
>>>>
>>>
>>>
>>
>
>
> --
> With kind regards,
>
> Robin Verlangen
> www.robinverlangen.nl
>
>

Re: Schema advice/help

Posted by "R. Verlangen" <ro...@us2.nl>.
You can just get a slice range with as start "userId:" and no end.

2012/3/27 Maciej Miklas <ma...@googlemail.com>

> multiget would require Order Preserving Partitioner, and this can lead to
> unbalanced ring and hot spots.
>
> Maybe you can use secondary index on "itemtype" - is must have small
> cardinality:
> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>
>
>
>
> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com> wrote:
>
>> without the ability to do disjoint column slices, i would probably use 5
>> different rows.
>>
>> userId:itemType -> activityId
>>
>> then it's a multiget slice of 10 items from each of your 5 rows.
>>
>>
>> On 26/03/2012 22:16, Ertio Lew wrote:
>>
>>> I need to store activities by each user, on 5 items types. I always want
>>> to read last 10 activities on each item type, by a user (ie, total
>>> activities to read at a time =50).
>>>
>>> I am wanting to store these activities in a single row for each user so
>>> that they can be retrieved in single row query, since I want to read all
>>> the last 10 activities on each item.. I am thinking of creating composite
>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>> value) but then, I don't see about how to read the last 10 activities from
>>> all itemtypes.
>>>
>>> Any ideas about schema to do this better way ?
>>>
>>
>>
>


-- 
With kind regards,

Robin Verlangen
www.robinverlangen.nl

Re: Schema advice/help

Posted by Maciej Miklas <ma...@googlemail.com>.
correct - I see also no other solution for this problem

On Thu, Mar 29, 2012 at 1:46 AM, Guy Incognito <dn...@gmail.com> wrote:

>  well, no.  my assumption is that he knows what the 5 itemTypes (or
> appropriate corresponding ids) are, so he can do a known 5-rowkey lookup.
> if he does not know, then agreed, my proposal is not a great fit.
>
> could do (as originally suggested)
>
> userId -> itemType:activityId
>
> if you want to keep everything in the same row (again assumes that you
> know what the itemTypes are).  but then you can't really do a multiget, you
> have to do 5 separate slice queries, one for each item type.
>
> can also do some wacky stuff around maintaining a row that explicitly only
> holds the last 10 items by itemType (meaning you have to delete the oldest
> one everytime you insert a new one), but that prolly requires read-on-write
> etc and is a lot messier.  and you will prolly need to worry about the case
> where you (transiently) have more than 10 'latest' items for a single
> itemType.
>
> On 28/03/2012 09:49, Maciej Miklas wrote:
>
> yes - but anyway in your example you need "key range quey" and that
> requires OOP, right?
>
> On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dn...@gmail.com> wrote:
>
>>  multiget does not require OPP.
>>
>> On 27/03/2012 09:51, Maciej Miklas wrote:
>>
>> multiget would require Order Preserving Partitioner, and this can lead to
>> unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have small
>> cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com>wrote:
>>
>>> without the ability to do disjoint column slices, i would probably use 5
>>> different rows.
>>>
>>> userId:itemType -> activityId
>>>
>>> then it's a multiget slice of 10 items from each of your 5 rows.
>>>
>>>
>>> On 26/03/2012 22:16, Ertio Lew wrote:
>>>
>>>> I need to store activities by each user, on 5 items types. I always
>>>> want to read last 10 activities on each item type, by a user (ie, total
>>>> activities to read at a time =50).
>>>>
>>>> I am wanting to store these activities in a single row for each user so
>>>> that they can be retrieved in single row query, since I want to read all
>>>> the last 10 activities on each item.. I am thinking of creating composite
>>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>>> value) but then, I don't see about how to read the last 10 activities from
>>>> all itemtypes.
>>>>
>>>> Any ideas about schema to do this better way ?
>>>>
>>>
>>>
>>
>>
>
>

Re: Schema advice/help

Posted by Guy Incognito <dn...@gmail.com>.
well, no.  my assumption is that he knows what the 5 itemTypes (or 
appropriate corresponding ids) are, so he can do a known 5-rowkey 
lookup.  if he does not know, then agreed, my proposal is not a great fit.

could do (as originally suggested)

userId -> itemType:activityId

if you want to keep everything in the same row (again assumes that you 
know what the itemTypes are).  but then you can't really do a multiget, 
you have to do 5 separate slice queries, one for each item type.

can also do some wacky stuff around maintaining a row that explicitly 
only holds the last 10 items by itemType (meaning you have to delete the 
oldest one everytime you insert a new one), but that prolly requires 
read-on-write etc and is a lot messier.  and you will prolly need to 
worry about the case where you (transiently) have more than 10 'latest' 
items for a single itemType.

On 28/03/2012 09:49, Maciej Miklas wrote:
> yes - but anyway in your example you need "key range quey" and that 
> requires OOP, right?
>
> On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dnd1066@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     multiget does not require OPP.
>
>     On 27/03/2012 09:51, Maciej Miklas wrote:
>>     multiget would require Order Preserving Partitioner, and this can
>>     lead to unbalanced ring and hot spots.
>>
>>     Maybe you can use secondary index on "itemtype" - is must have
>>     small cardinality:
>>     http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>>     On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito
>>     <dnd1066@gmail.com <ma...@gmail.com>> wrote:
>>
>>         without the ability to do disjoint column slices, i would
>>         probably use 5 different rows.
>>
>>         userId:itemType -> activityId
>>
>>         then it's a multiget slice of 10 items from each of your 5 rows.
>>
>>
>>         On 26/03/2012 22:16, Ertio Lew wrote:
>>
>>             I need to store activities by each user, on 5 items
>>             types. I always want to read last 10 activities on each
>>             item type, by a user (ie, total activities to read at a
>>             time =50).
>>
>>             I am wanting to store these activities in a single row
>>             for each user so that they can be retrieved in single row
>>             query, since I want to read all the last 10 activities on
>>             each item.. I am thinking of creating composite names
>>             appending "itemtype" : "activityId"(activityId is just
>>             timestamp value) but then, I don't see about how to read
>>             the last 10 activities from all itemtypes.
>>
>>             Any ideas about schema to do this better way ?
>>
>>
>>
>
>


Re: Schema advice/help

Posted by Maciej Miklas <ma...@googlemail.com>.
yes - but anyway in your example you need "key range quey" and that
requires OOP, right?

On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dn...@gmail.com> wrote:

>  multiget does not require OPP.
>
> On 27/03/2012 09:51, Maciej Miklas wrote:
>
> multiget would require Order Preserving Partitioner, and this can lead to
> unbalanced ring and hot spots.
>
> Maybe you can use secondary index on "itemtype" - is must have small
> cardinality:
> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>
>
>
> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com> wrote:
>
>> without the ability to do disjoint column slices, i would probably use 5
>> different rows.
>>
>> userId:itemType -> activityId
>>
>> then it's a multiget slice of 10 items from each of your 5 rows.
>>
>>
>> On 26/03/2012 22:16, Ertio Lew wrote:
>>
>>> I need to store activities by each user, on 5 items types. I always want
>>> to read last 10 activities on each item type, by a user (ie, total
>>> activities to read at a time =50).
>>>
>>> I am wanting to store these activities in a single row for each user so
>>> that they can be retrieved in single row query, since I want to read all
>>> the last 10 activities on each item.. I am thinking of creating composite
>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>> value) but then, I don't see about how to read the last 10 activities from
>>> all itemtypes.
>>>
>>> Any ideas about schema to do this better way ?
>>>
>>
>>
>
>

Re: Schema advice/help

Posted by Guy Incognito <dn...@gmail.com>.
multiget does not require OPP.

On 27/03/2012 09:51, Maciej Miklas wrote:
> multiget would require Order Preserving Partitioner, and this can lead 
> to unbalanced ring and hot spots.
>
> Maybe you can use secondary index on "itemtype" - is must have small 
> cardinality: 
> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>
>
>
> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dnd1066@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     without the ability to do disjoint column slices, i would probably
>     use 5 different rows.
>
>     userId:itemType -> activityId
>
>     then it's a multiget slice of 10 items from each of your 5 rows.
>
>
>     On 26/03/2012 22:16, Ertio Lew wrote:
>
>         I need to store activities by each user, on 5 items types. I
>         always want to read last 10 activities on each item type, by a
>         user (ie, total activities to read at a time =50).
>
>         I am wanting to store these activities in a single row for
>         each user so that they can be retrieved in single row query,
>         since I want to read all the last 10 activities on each item..
>         I am thinking of creating composite names appending "itemtype"
>         : "activityId"(activityId is just timestamp value) but then, I
>         don't see about how to read the last 10 activities from all
>         itemtypes.
>
>         Any ideas about schema to do this better way ?
>
>
>


Re: Schema advice/help

Posted by Maciej Miklas <ma...@googlemail.com>.
multiget would require Order Preserving Partitioner, and this can lead to
unbalanced ring and hot spots.

Maybe you can use secondary index on "itemtype" - is must have small
cardinality:
http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/



On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com> wrote:

> without the ability to do disjoint column slices, i would probably use 5
> different rows.
>
> userId:itemType -> activityId
>
> then it's a multiget slice of 10 items from each of your 5 rows.
>
>
> On 26/03/2012 22:16, Ertio Lew wrote:
>
>> I need to store activities by each user, on 5 items types. I always want
>> to read last 10 activities on each item type, by a user (ie, total
>> activities to read at a time =50).
>>
>> I am wanting to store these activities in a single row for each user so
>> that they can be retrieved in single row query, since I want to read all
>> the last 10 activities on each item.. I am thinking of creating composite
>> names appending "itemtype" : "activityId"(activityId is just timestamp
>> value) but then, I don't see about how to read the last 10 activities from
>> all itemtypes.
>>
>> Any ideas about schema to do this better way ?
>>
>
>

Re: Schema advice/help

Posted by Guy Incognito <dn...@gmail.com>.
without the ability to do disjoint column slices, i would probably use 5 
different rows.

userId:itemType -> activityId

then it's a multiget slice of 10 items from each of your 5 rows.

On 26/03/2012 22:16, Ertio Lew wrote:
> I need to store activities by each user, on 5 items types. I always 
> want to read last 10 activities on each item type, by a user (ie, 
> total activities to read at a time =50).
>
> I am wanting to store these activities in a single row for each user 
> so that they can be retrieved in single row query, since I want to 
> read all the last 10 activities on each item.. I am thinking of 
> creating composite names appending "itemtype" : 
> "activityId"(activityId is just timestamp value) but then, I don't see 
> about how to read the last 10 activities from all itemtypes.
>
> Any ideas about schema to do this better way ?