You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by tommaso barbugli <tb...@gmail.com> on 2014/08/09 11:05:51 UTC

aggregating data in cassandra

Hi everyone,
I am a bit stuck with my data model on Cassandra; What I am trying to do is
to be able to retrieve rows in groups, something similar to sql's GROUP BY
but that works only on one attribute.

I am keeping data grouped together in a different CF (eg. GROUP BY x had
his own CF groupby_x), whenever a new row is inserted I do a read from the
CF with grouped data; find out which group is a good fit and then update
that row.

This of course has huge problems, every insert needs to read from the
aggregated CF to find a good match and then update a row which means I am
getting in trouble with race conditions and tombstones; the second being my
main concern.

So far I was able to keep the thing running locking groups and by batching
updates together (so that I would have less updates for the same group) but
I see this is not the way to go.

Unfortunately I cant find another model that gives me the ability to
paginate data in groups sorted by their update time. (in fact a group holds
some information about the group itself like, last updated at, counts ...)

Thanks,
Tommaso