You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Ertio Lew <er...@gmail.com> on 2012/03/26 23:16:25 UTC
Schema advice/help
I need to store activities by each user, on 5 items types. I always want to
read last 10 activities on each item type, by a user (ie, total activities
to read at a time =50).
I am wanting to store these activities in a single row for each user so
that they can be retrieved in single row query, since I want to read all
the last 10 activities on each item.. I am thinking of creating composite
names appending "itemtype" : "activityId"(activityId is just timestamp
value) but then, I don't see about how to read the last 10 activities from
all itemtypes.
Any ideas about schema to do this better way ?
Re: Schema advice/help
Posted by Ertio Lew <er...@gmail.com>.
@R. Verlangen:
You are suggesting to keep a single row for all activities & read all the
columns from the row & then filter, right!?
If done that way (instead of keeping it in 5 rows) then I would need to
retrieve 100s-200s of columns from single row rather than just 50 columns
if I keep in 5 rows.. Which of these two would be better ? More columns
from single row OR less columns from multiple rows ?
On Tue, Mar 27, 2012 at 2:27 PM, R. Verlangen <ro...@us2.nl> wrote:
> You can just get a slice range with as start "userId:" and no end.
>
>
> 2012/3/27 Maciej Miklas <ma...@googlemail.com>
>
>> multiget would require Order Preserving Partitioner, and this can lead to
>> unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have small
>> cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com>wrote:
>>
>>> without the ability to do disjoint column slices, i would probably use 5
>>> different rows.
>>>
>>> userId:itemType -> activityId
>>>
>>> then it's a multiget slice of 10 items from each of your 5 rows.
>>>
>>>
>>> On 26/03/2012 22:16, Ertio Lew wrote:
>>>
>>>> I need to store activities by each user, on 5 items types. I always
>>>> want to read last 10 activities on each item type, by a user (ie, total
>>>> activities to read at a time =50).
>>>>
>>>> I am wanting to store these activities in a single row for each user so
>>>> that they can be retrieved in single row query, since I want to read all
>>>> the last 10 activities on each item.. I am thinking of creating composite
>>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>>> value) but then, I don't see about how to read the last 10 activities from
>>>> all itemtypes.
>>>>
>>>> Any ideas about schema to do this better way ?
>>>>
>>>
>>>
>>
>
>
> --
> With kind regards,
>
> Robin Verlangen
> www.robinverlangen.nl
>
>
Re: Schema advice/help
Posted by "R. Verlangen" <ro...@us2.nl>.
You can just get a slice range with as start "userId:" and no end.
2012/3/27 Maciej Miklas <ma...@googlemail.com>
> multiget would require Order Preserving Partitioner, and this can lead to
> unbalanced ring and hot spots.
>
> Maybe you can use secondary index on "itemtype" - is must have small
> cardinality:
> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>
>
>
>
> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com> wrote:
>
>> without the ability to do disjoint column slices, i would probably use 5
>> different rows.
>>
>> userId:itemType -> activityId
>>
>> then it's a multiget slice of 10 items from each of your 5 rows.
>>
>>
>> On 26/03/2012 22:16, Ertio Lew wrote:
>>
>>> I need to store activities by each user, on 5 items types. I always want
>>> to read last 10 activities on each item type, by a user (ie, total
>>> activities to read at a time =50).
>>>
>>> I am wanting to store these activities in a single row for each user so
>>> that they can be retrieved in single row query, since I want to read all
>>> the last 10 activities on each item.. I am thinking of creating composite
>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>> value) but then, I don't see about how to read the last 10 activities from
>>> all itemtypes.
>>>
>>> Any ideas about schema to do this better way ?
>>>
>>
>>
>
--
With kind regards,
Robin Verlangen
www.robinverlangen.nl
Re: Schema advice/help
Posted by Maciej Miklas <ma...@googlemail.com>.
correct - I see also no other solution for this problem
On Thu, Mar 29, 2012 at 1:46 AM, Guy Incognito <dn...@gmail.com> wrote:
> well, no. my assumption is that he knows what the 5 itemTypes (or
> appropriate corresponding ids) are, so he can do a known 5-rowkey lookup.
> if he does not know, then agreed, my proposal is not a great fit.
>
> could do (as originally suggested)
>
> userId -> itemType:activityId
>
> if you want to keep everything in the same row (again assumes that you
> know what the itemTypes are). but then you can't really do a multiget, you
> have to do 5 separate slice queries, one for each item type.
>
> can also do some wacky stuff around maintaining a row that explicitly only
> holds the last 10 items by itemType (meaning you have to delete the oldest
> one everytime you insert a new one), but that prolly requires read-on-write
> etc and is a lot messier. and you will prolly need to worry about the case
> where you (transiently) have more than 10 'latest' items for a single
> itemType.
>
> On 28/03/2012 09:49, Maciej Miklas wrote:
>
> yes - but anyway in your example you need "key range quey" and that
> requires OOP, right?
>
> On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dn...@gmail.com> wrote:
>
>> multiget does not require OPP.
>>
>> On 27/03/2012 09:51, Maciej Miklas wrote:
>>
>> multiget would require Order Preserving Partitioner, and this can lead to
>> unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have small
>> cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com>wrote:
>>
>>> without the ability to do disjoint column slices, i would probably use 5
>>> different rows.
>>>
>>> userId:itemType -> activityId
>>>
>>> then it's a multiget slice of 10 items from each of your 5 rows.
>>>
>>>
>>> On 26/03/2012 22:16, Ertio Lew wrote:
>>>
>>>> I need to store activities by each user, on 5 items types. I always
>>>> want to read last 10 activities on each item type, by a user (ie, total
>>>> activities to read at a time =50).
>>>>
>>>> I am wanting to store these activities in a single row for each user so
>>>> that they can be retrieved in single row query, since I want to read all
>>>> the last 10 activities on each item.. I am thinking of creating composite
>>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>>> value) but then, I don't see about how to read the last 10 activities from
>>>> all itemtypes.
>>>>
>>>> Any ideas about schema to do this better way ?
>>>>
>>>
>>>
>>
>>
>
>
Re: Schema advice/help
Posted by Guy Incognito <dn...@gmail.com>.
well, no. my assumption is that he knows what the 5 itemTypes (or
appropriate corresponding ids) are, so he can do a known 5-rowkey
lookup. if he does not know, then agreed, my proposal is not a great fit.
could do (as originally suggested)
userId -> itemType:activityId
if you want to keep everything in the same row (again assumes that you
know what the itemTypes are). but then you can't really do a multiget,
you have to do 5 separate slice queries, one for each item type.
can also do some wacky stuff around maintaining a row that explicitly
only holds the last 10 items by itemType (meaning you have to delete the
oldest one everytime you insert a new one), but that prolly requires
read-on-write etc and is a lot messier. and you will prolly need to
worry about the case where you (transiently) have more than 10 'latest'
items for a single itemType.
On 28/03/2012 09:49, Maciej Miklas wrote:
> yes - but anyway in your example you need "key range quey" and that
> requires OOP, right?
>
> On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dnd1066@gmail.com
> <ma...@gmail.com>> wrote:
>
> multiget does not require OPP.
>
> On 27/03/2012 09:51, Maciej Miklas wrote:
>> multiget would require Order Preserving Partitioner, and this can
>> lead to unbalanced ring and hot spots.
>>
>> Maybe you can use secondary index on "itemtype" - is must have
>> small cardinality:
>> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>>
>>
>>
>> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito
>> <dnd1066@gmail.com <ma...@gmail.com>> wrote:
>>
>> without the ability to do disjoint column slices, i would
>> probably use 5 different rows.
>>
>> userId:itemType -> activityId
>>
>> then it's a multiget slice of 10 items from each of your 5 rows.
>>
>>
>> On 26/03/2012 22:16, Ertio Lew wrote:
>>
>> I need to store activities by each user, on 5 items
>> types. I always want to read last 10 activities on each
>> item type, by a user (ie, total activities to read at a
>> time =50).
>>
>> I am wanting to store these activities in a single row
>> for each user so that they can be retrieved in single row
>> query, since I want to read all the last 10 activities on
>> each item.. I am thinking of creating composite names
>> appending "itemtype" : "activityId"(activityId is just
>> timestamp value) but then, I don't see about how to read
>> the last 10 activities from all itemtypes.
>>
>> Any ideas about schema to do this better way ?
>>
>>
>>
>
>
Re: Schema advice/help
Posted by Maciej Miklas <ma...@googlemail.com>.
yes - but anyway in your example you need "key range quey" and that
requires OOP, right?
On Tue, Mar 27, 2012 at 5:13 PM, Guy Incognito <dn...@gmail.com> wrote:
> multiget does not require OPP.
>
> On 27/03/2012 09:51, Maciej Miklas wrote:
>
> multiget would require Order Preserving Partitioner, and this can lead to
> unbalanced ring and hot spots.
>
> Maybe you can use secondary index on "itemtype" - is must have small
> cardinality:
> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>
>
>
> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com> wrote:
>
>> without the ability to do disjoint column slices, i would probably use 5
>> different rows.
>>
>> userId:itemType -> activityId
>>
>> then it's a multiget slice of 10 items from each of your 5 rows.
>>
>>
>> On 26/03/2012 22:16, Ertio Lew wrote:
>>
>>> I need to store activities by each user, on 5 items types. I always want
>>> to read last 10 activities on each item type, by a user (ie, total
>>> activities to read at a time =50).
>>>
>>> I am wanting to store these activities in a single row for each user so
>>> that they can be retrieved in single row query, since I want to read all
>>> the last 10 activities on each item.. I am thinking of creating composite
>>> names appending "itemtype" : "activityId"(activityId is just timestamp
>>> value) but then, I don't see about how to read the last 10 activities from
>>> all itemtypes.
>>>
>>> Any ideas about schema to do this better way ?
>>>
>>
>>
>
>
Re: Schema advice/help
Posted by Guy Incognito <dn...@gmail.com>.
multiget does not require OPP.
On 27/03/2012 09:51, Maciej Miklas wrote:
> multiget would require Order Preserving Partitioner, and this can lead
> to unbalanced ring and hot spots.
>
> Maybe you can use secondary index on "itemtype" - is must have small
> cardinality:
> http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
>
>
>
> On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dnd1066@gmail.com
> <ma...@gmail.com>> wrote:
>
> without the ability to do disjoint column slices, i would probably
> use 5 different rows.
>
> userId:itemType -> activityId
>
> then it's a multiget slice of 10 items from each of your 5 rows.
>
>
> On 26/03/2012 22:16, Ertio Lew wrote:
>
> I need to store activities by each user, on 5 items types. I
> always want to read last 10 activities on each item type, by a
> user (ie, total activities to read at a time =50).
>
> I am wanting to store these activities in a single row for
> each user so that they can be retrieved in single row query,
> since I want to read all the last 10 activities on each item..
> I am thinking of creating composite names appending "itemtype"
> : "activityId"(activityId is just timestamp value) but then, I
> don't see about how to read the last 10 activities from all
> itemtypes.
>
> Any ideas about schema to do this better way ?
>
>
>
Re: Schema advice/help
Posted by Maciej Miklas <ma...@googlemail.com>.
multiget would require Order Preserving Partitioner, and this can lead to
unbalanced ring and hot spots.
Maybe you can use secondary index on "itemtype" - is must have small
cardinality:
http://pkghosh.wordpress.com/2011/03/02/cassandra-secondary-index-patterns/
On Tue, Mar 27, 2012 at 10:10 AM, Guy Incognito <dn...@gmail.com> wrote:
> without the ability to do disjoint column slices, i would probably use 5
> different rows.
>
> userId:itemType -> activityId
>
> then it's a multiget slice of 10 items from each of your 5 rows.
>
>
> On 26/03/2012 22:16, Ertio Lew wrote:
>
>> I need to store activities by each user, on 5 items types. I always want
>> to read last 10 activities on each item type, by a user (ie, total
>> activities to read at a time =50).
>>
>> I am wanting to store these activities in a single row for each user so
>> that they can be retrieved in single row query, since I want to read all
>> the last 10 activities on each item.. I am thinking of creating composite
>> names appending "itemtype" : "activityId"(activityId is just timestamp
>> value) but then, I don't see about how to read the last 10 activities from
>> all itemtypes.
>>
>> Any ideas about schema to do this better way ?
>>
>
>
Re: Schema advice/help
Posted by Guy Incognito <dn...@gmail.com>.
without the ability to do disjoint column slices, i would probably use 5
different rows.
userId:itemType -> activityId
then it's a multiget slice of 10 items from each of your 5 rows.
On 26/03/2012 22:16, Ertio Lew wrote:
> I need to store activities by each user, on 5 items types. I always
> want to read last 10 activities on each item type, by a user (ie,
> total activities to read at a time =50).
>
> I am wanting to store these activities in a single row for each user
> so that they can be retrieved in single row query, since I want to
> read all the last 10 activities on each item.. I am thinking of
> creating composite names appending "itemtype" :
> "activityId"(activityId is just timestamp value) but then, I don't see
> about how to read the last 10 activities from all itemtypes.
>
> Any ideas about schema to do this better way ?