You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Darren Smythe <da...@gmail.com> on 2013/06/13 20:19:50 UTC

Billions of counters

We want to precalculate counts for some common metrics for usage. We have
events, locations, products, etc. The problem is we have millions
events/day, thousands of locations and millions of products.

Were trying to precalculate counts for some common queries like 'how many
times was product X purchased in location Y last week'.

It seems like we'll end up with trillions of counters for even these basic
permutations. Is this a cause for concern?

TIA

-- Darren

Re: Billions of counters

Posted by Janne Jalkanen <ja...@ecyrd.com>.
Hi!

We have a similar situation of millions of events on millions of items - turns out that this isn't really a problem, because there tends to be a very strong power -distribution: very few of the items get a lot of hits, some get some, and the majority gets no hits (though most of them do get hits every now and then).  So it's basically a sparse multidimensional array, and turns out that Cassandra is pretty good at storing those.  We just treat a missing counter column as zero, and add a counter only when necessary.  To avoid I/O, we also do some statistical sampling for certain counters where we don't need an exact figure.

YMMV, of course, but I'd look at the likelihood of all the products being purchased from the same location during one week at least once and start the modeling from there. :)

/Janne

On 13 Jun 2013, at 21:19, Darren Smythe <da...@gmail.com> wrote:

> We want to precalculate counts for some common metrics for usage. We have events, locations, products, etc. The problem is we have millions events/day, thousands of locations and millions of products.
> 
> Were trying to precalculate counts for some common queries like 'how many times was product X purchased in location Y last week'.
> 
> It seems like we'll end up with trillions of counters for even these basic permutations. Is this a cause for concern?
> 
> TIA
> 
> -- Darren