You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Darren Smythe <da...@gmail.com> on 2013/06/13 20:19:50 UTC
Billions of counters
We want to precalculate counts for some common metrics for usage. We have
events, locations, products, etc. The problem is we have millions
events/day, thousands of locations and millions of products.
Were trying to precalculate counts for some common queries like 'how many
times was product X purchased in location Y last week'.
It seems like we'll end up with trillions of counters for even these basic
permutations. Is this a cause for concern?
TIA
-- Darren
Re: Billions of counters
Posted by Janne Jalkanen <ja...@ecyrd.com>.
Hi!
We have a similar situation of millions of events on millions of items - turns out that this isn't really a problem, because there tends to be a very strong power -distribution: very few of the items get a lot of hits, some get some, and the majority gets no hits (though most of them do get hits every now and then). So it's basically a sparse multidimensional array, and turns out that Cassandra is pretty good at storing those. We just treat a missing counter column as zero, and add a counter only when necessary. To avoid I/O, we also do some statistical sampling for certain counters where we don't need an exact figure.
YMMV, of course, but I'd look at the likelihood of all the products being purchased from the same location during one week at least once and start the modeling from there. :)
/Janne
On 13 Jun 2013, at 21:19, Darren Smythe <da...@gmail.com> wrote:
> We want to precalculate counts for some common metrics for usage. We have events, locations, products, etc. The problem is we have millions events/day, thousands of locations and millions of products.
>
> Were trying to precalculate counts for some common queries like 'how many times was product X purchased in location Y last week'.
>
> It seems like we'll end up with trillions of counters for even these basic permutations. Is this a cause for concern?
>
> TIA
>
> -- Darren