You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2013/02/26 05:08:37 UTC
Notes from committer's meeting: counters

For me, counters is kind of that place on the map of the world from
the 15th century that says, "Here there be dragons."  So I'll do my
best to relate the discussion accurately but it's quite possible I got
something wrong.

There are multiple problems with the current counter implementation:

* You cannot move sstables around (i.e., between machines, or from
snapshots).  Which is not to say you can't do it AT ALL, but it's a
lot more fragile than normal data files.  Sylvain said something about
solving this with locking and version increments, hopefully he can
elaborate.
* This directly follows from counter updates not being idempotent --
if an update times out, you can't simply retry; if the original did in
fact succeed then you will have double-counted.  But if you don't
retry you may have lost an update.  Pick your poison!  Sylvain has
tried to solve this with an extra columnfamily containing update uuids
and values, to allow replaying with eventual consistency, but couldn't
get the gremlins out [1].
* Performance is also problematic.  The original design called for
no-read-before-write, but we discovered relatively early on that we
needed to merge counters before replicating them, which requires
performing a read to get the full, current value.  Thus, the infamous
replicate_on_write option -- turning it off improves performance
dramatically, but you will almost certainly lose data at some point if
you do.
* replicate_on_write is extra devious because it performs the read in
the background at CL.ONE.  This surprises people in two ways: when
latency gets dramatically worse when writing at QUORUM, and when at
some point your writes start forcing too many random reads for your
disks to handle and performance drops off a cliff across the board.
* Despite our best efforts, counters remain buggy.  It's probably not
much of an exaggeration to say that everyone who runs counters in
production has seen "invalid shard" errors in the log.
* These bugs may be solvable "given enough eyeballs," but other
problems are inherent in the design.  Perhaps most severe is that RF=1
means you can lose data during bootstrap; we [3] know why, but it's
not solvable with the current design.

[4] proposes a new counters design that addresses these problems.  We
briefly discussed Jonathan Haliday's suggestion to broaden counters to
include user-defined aggregation.  This seems doable, but on the other
hand, we don't even have a working replacement for the existing
functionality yet.

[1] https://issues.apache.org/jira/browse/CASSANDRA-2495
[2] https://issues.apache.org/jira/browse/CASSANDRA-3868
[3] Sylvain
[4] https://issues.apache.org/jira/browse/CASSANDRA-4775

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder, http://www.datastax.com
@spyced