You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Christopher Wirt <ch...@struq.com> on 2013/12/05 16:44:12 UTC

Counters question - is there a better way to count

I want to build a really simple column family which counts the occurrence of
a single event X. 

 

Once we reach Y occurrences of X the counter resets to 0

 

The obvious way to do this is with a counter CF. 

 

CREATE TABLE xcounter1 (

                id uuid,

                someid int,

                count counter

) PRIMARY KEY (uid, someid)

 

This is how I've always done it in the past, but I've been told to avoid
counters for various reasons, performance, consistency etc..

I'm not too bothered about 100% absolute consistency, however read
performance is certainly a big concern.

 

So I was thinking to avoid using counters I could do something like this.

 

CREATE TABLE xcounter2 (

                id uuid,

                someid int,

                time timeuuid

) PRIMARY KEY (uid, someid, time)

 

Then retrieve all events and count in memory. Delete all id, someid records
once I hit Y.

 

Or I could 

CREATE TABLE xcounter3 (

                id uuid,

                someid int,

                time timeuuid, 

                Ycount int

) PRIMARY KEY (uid, someid, time)

 

Insert a 'Ycount' on each occurrence of the event. 

Only retrieve the last Y value inserted on reading

Then delete all records once I hit the magic Y value.

 

 

Anyone have any interesting thoughts or insight on what is likely to give me
the best read performance?

There will be 100's of someid to each id. Reads will be 5-10x the writes.

 

 

Thanks,

 

Chris

Re: Counters question - is there a better way to count

Posted by Alex Popescu <al...@datastax.com>.

On Thu, Dec 5, 2013 at 7:44 AM, Christopher Wirt <ch...@struq.com>wrote:

> I want to build a really simple column family which counts the occurrence
> of a single event X.
>
>
The guys from Disqus are big into counters:

https://www.youtube.com/watch?v=A2WdS0YQADo

http://www.slideshare.net/planetcassandra/cassandra-at-disqus (relevant
slides start at 25)


-- 

:- a)


Alex Popescu
@al3xandru

RE: Counters question - is there a better way to count

Posted by Christopher Wirt <ch...@struq.com>.

Hi Andy,

There will be 10's millions of uid each with 100's of someid being accessed
each day.

 

Hi Przemek, 

We currently use counter column families, but they are some of our slowest.
(they are also some of our biggest, so the counter type might not be the
issue)

 

We have a strong need for a cross DC solution. We could use redis and handle
the replication ourselves, but are hoping not to have to do this.

 

Regarding tweaking the compaction thresholds, so you mean
increase/decreasing the min/max _compaction_thresholds? I guess decreasing
both values will result in more compaction so fewer SSTable reads, so faster
reads? (at the cost of heavier cpu/disk usage?)

 

We will always require all of a uids, someid so adding someid to the
partition key is not an option at this time.

 

Thanks,

Chris

 

 

 

From: Przemek Maciolek [mailto:pmaciolek@gmail.com] 
Sent: 05 December 2013 16:04
To: user@cassandra.apache.org
Subject: Re: Counters question - is there a better way to count

 

Some big systems using Cassandra's counters were built (such as Rainbird:
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-s
trata-2011 ) and seem to be doing great job.

 

If you are concerned with performance, then maybe using memory-based store
(such as Redis) will better suit your case (as long as it fits in the
memory, but considering the data model, I guess it might work).

 

If you are going to stick with Cassandra, then tweaking compaction threshold
can make a visible difference on the read performance, at least from what I
have seen. You can also consider changing the PRIMARY KEY to ((uid, someid),
time) - this will make the partition key out of uid+someid, rather than just
someid. Depending on the access pattern, it might help.

 

On Thu, Dec 5, 2013 at 4:44 PM, Christopher Wirt <ch...@struq.com>
wrote:

I want to build a really simple column family which counts the occurrence of
a single event X. 

 

Once we reach Y occurrences of X the counter resets to 0

 

The obvious way to do this is with a counter CF. 

 

CREATE TABLE xcounter1 (

                id uuid,

                someid int,

                count counter

) PRIMARY KEY (uid, someid)

 

This is how I've always done it in the past, but I've been told to avoid
counters for various reasons, performance, consistency etc..

I'm not too bothered about 100% absolute consistency, however read
performance is certainly a big concern.

 

So I was thinking to avoid using counters I could do something like this.

 

CREATE TABLE xcounter2 (

                id uuid,

                someid int,

                time timeuuid

) PRIMARY KEY (uid, someid, time)

 

Then retrieve all events and count in memory. Delete all id, someid records
once I hit Y.

 

Or I could 

CREATE TABLE xcounter3 (

                id uuid,

                someid int,

                time timeuuid, 

                Ycount int

) PRIMARY KEY (uid, someid, time)

 

Insert a 'Ycount' on each occurrence of the event. 

Only retrieve the last Y value inserted on reading

Then delete all records once I hit the magic Y value.

 

 

Anyone have any interesting thoughts or insight on what is likely to give me
the best read performance?

There will be 100's of someid to each id. Reads will be 5-10x the writes.

 

 

Thanks,

 

Chris

Re: Counters question - is there a better way to count

Posted by Przemek Maciolek <pm...@gmail.com>.

Some big systems using Cassandra's counters were built (such as Rainbird:
http://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011)
and seem to be doing great job.

If you are concerned with performance, then maybe using memory-based store
(such as Redis) will better suit your case (as long as it fits in the
memory, but considering the data model, I guess it might work).

If you are going to stick with Cassandra, then tweaking compaction
threshold can make a visible difference on the read performance, at least
from what I have seen. You can also consider changing the PRIMARY KEY to
((uid, someid), time) - this will make the partition key out of uid+someid,
rather than just someid. Depending on the access pattern, it might help.

On Thu, Dec 5, 2013 at 4:44 PM, Christopher Wirt <ch...@struq.com>wrote:

> I want to build a really simple column family which counts the occurrence
> of a single event X.
>
>
>
> Once we reach Y occurrences of X the counter resets to 0
>
>
>
> The obvious way to do this is with a counter CF.
>
>
>
> CREATE TABLE xcounter1 (
>
>                 id uuid,
>
>                 someid int,
>
>                 count counter
>
> ) PRIMARY KEY (uid, someid)
>
>
>
> This is how I’ve always done it in the past, but I’ve been told to avoid
> counters for various reasons, performance, consistency etc..
>
> I’m not too bothered about 100% absolute consistency, however read
> performance is certainly a big concern.
>
>
>
> So I was thinking to avoid using counters I could do something like this.
>
>
>
> CREATE TABLE xcounter2 (
>
>                 id uuid,
>
>                 someid int,
>
>                 time timeuuid
>
> ) PRIMARY KEY (uid, someid, time)
>
>
>
> Then retrieve all events and count in memory. Delete all id, someid
> records once I hit Y.
>
>
>
> Or I could
>
> CREATE TABLE xcounter3 (
>
>                 id uuid,
>
>                 someid int,
>
>                 time timeuuid,
>
>                 Ycount int
>
> ) PRIMARY KEY (uid, someid, time)
>
>
>
> Insert a ‘Ycount’ on each occurrence of the event.
>
> Only retrieve the last Y value inserted on reading
>
> Then delete all records once I hit the magic Y value.
>
>
>
>
>
> Anyone have any interesting thoughts or insight on what is likely to give
> me the best read performance?
>
> There will be 100’s of someid to each id. Reads will be 5-10x the writes.
>
>
>
>
>
> Thanks,
>
>
>
> Chris
>

Re: Counters question - is there a better way to count

Posted by Andy Twigg <an...@gmail.com>.

How many distinct uid,someid pairs will you have?
On Dec 5, 2013 3:44 PM, "Christopher Wirt" <ch...@struq.com> wrote:

> I want to build a really simple column family which counts the occurrence
> of a single event X.
>
>
>
> Once we reach Y occurrences of X the counter resets to 0
>
>
>
> The obvious way to do this is with a counter CF.
>
>
>
> CREATE TABLE xcounter1 (
>
>                 id uuid,
>
>                 someid int,
>
>                 count counter
>
> ) PRIMARY KEY (uid, someid)
>
>
>
> This is how I’ve always done it in the past, but I’ve been told to avoid
> counters for various reasons, performance, consistency etc..
>
> I’m not too bothered about 100% absolute consistency, however read
> performance is certainly a big concern.
>
>
>
> So I was thinking to avoid using counters I could do something like this.
>
>
>
> CREATE TABLE xcounter2 (
>
>                 id uuid,
>
>                 someid int,
>
>                 time timeuuid
>
> ) PRIMARY KEY (uid, someid, time)
>
>
>
> Then retrieve all events and count in memory. Delete all id, someid
> records once I hit Y.
>
>
>
> Or I could
>
> CREATE TABLE xcounter3 (
>
>                 id uuid,
>
>                 someid int,
>
>                 time timeuuid,
>
>                 Ycount int
>
> ) PRIMARY KEY (uid, someid, time)
>
>
>
> Insert a ‘Ycount’ on each occurrence of the event.
>
> Only retrieve the last Y value inserted on reading
>
> Then delete all records once I hit the magic Y value.
>
>
>
>
>
> Anyone have any interesting thoughts or insight on what is likely to give
> me the best read performance?
>
> There will be 100’s of someid to each id. Reads will be 5-10x the writes.
>
>
>
>
>
> Thanks,
>
>
>
> Chris
>