You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Nicolas Favre-Felix (JIRA)" <ji...@apache.org> on 2013/10/01 16:37:31 UTC

[jira] [Commented] (CASSANDRA-4775) Counters 2.0

    [ https://issues.apache.org/jira/browse/CASSANDRA-4775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782998#comment-13782998 ] 

Nicolas Favre-Felix commented on CASSANDRA-4775:
------------------------------------------------

I realize that there hasn't been much progress on this ticket during the summer. Following [~jbellis]'s call for contributors on cassandra-dev, we (Acunu) are willing to dedicate more time and resources to this particular ticket to see it implemented in the near future.

[~iamaleksey], thanks for your comments and critique; I would like to summarize the above and hopefully move this discussion towards a design that we can agree on.

As you pointed out, the set-based approach has some significant drawbacks:
* The design is complex and the distributed problem of merging deltas pushes some of this complexity to the client side.
* Reads are always more expensive as all replicas need to return the set of deltas they know about.
* The unpredictability in the distribution of read latencies is a serious issue.

The set-based design also has two advantages, speed (skipping a random read) and idempotence; I'd like to address these two below.

Although the read on the write path is an anti-pattern for Cassandra, it serves a useful purpose and ensures that the data sent from the coordinator to the replicas is always already summed up which means that reads stay predictably fast. The read during replicate_on_write is also more expensive than the one in a RMW design since we have to look up all the counter shards in (potentially) many sstables, and we have to do this every single time we increment a counter. RMW counters in Riak and HBase only need to read the latest value for the counter, and can immediately write it back to an in-memory data structure. I would also like to point out that reads will only become cheaper in the future as solid state drives become more commonplace, and not by a small amount. Designing a complex and unreliable solution to address a disappearing problem would be a mistake.

Idempotence is also a very useful property and the lack of safety for Cassandra counters is probably the first drawback that people mention when they describe Cassandra counters. Their reliability is questioned pretty often, and this criticism is not without merit. [~slebresne]'s suggestion of a retryable increment with a lock around the commit log entry is a great improvement over the current design, with the only limitation that the operation has to be commutative. I have written above that I had my doubts about the throughput of counters with such a lock, but I also recognize that there could be more optimisations as some suggested.

Internally, I would suggest the sharded counter design remain similar with one shard per replica for all commutative replicated data types; implementing counters as PN-counters is also a great way to introduce more general data types, something I believe would be much welcomed by the Cassandra community.

After this long discussion I believe the following design would be a viable alternative:
* Remove the replicate_on_write option and switch to a locked/retryable RMW.
* Provide a pluggable way to implement commutative types, with a PN-counter implementation.

This solution could also be migrated to from an existing deployment; the proposed set-based design would be too complex for that.

We are ready to work on the implementation as well in time for the target release of 2.1.

> Counters 2.0
> ------------
>
>                 Key: CASSANDRA-4775
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4775
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Arya Goudarzi
>            Assignee: Aleksey Yeschenko
>              Labels: counters
>             Fix For: 2.1
>
>
> The existing partitioned counters remain a source of frustration for most users almost two years after being introduced.  The remaining problems are inherent in the design, not something that can be fixed given enough time/eyeballs.
> Ideally a solution would give us
> - similar performance
> - less special cases in the code
> - potential for a retry mechanism



--
This message was sent by Atlassian JIRA
(v6.1#6144)