You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Yuki Morishita (Commented) (JIRA)" <ji...@apache.org> on 2012/02/07 00:25:00 UTC

[jira] [Commented] (CASSANDRA-3821) Counters in super columns don't preserve correct values after cluster restart

    [ https://issues.apache.org/jira/browse/CASSANDRA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13201768#comment-13201768 ] 

Yuki Morishita commented on CASSANDRA-3821:
-------------------------------------------

Here is my initial look at the issue (might be wrong):

Concurrent counter mutation replay from commitlog and AtomicSortedColumns inside Memtable seem to be the cause of over count.
There is a race condition when adding column to memtable, and when it happens AtomicSortedColumns calls {{{IColumn#reconcile}}} multiple times until column is stored. It causes over count since counter column's {{reconcile}} is not idempotent operation.
                
> Counters in super columns don't preserve correct values after cluster restart
> -----------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3821
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.1
>         Environment: ubuntu, 'trunk' branch, used ccm to create a 3 node cluster with rf=3. A dtest was created to demonstrate.
>            Reporter: Tyler Patterson
>
> Set up a 3-node cluster with rf=3. Create a counter super column family and increment a bunch of subcolumns 100 times each, with cf=QUORUM. Then wait a few second, restart the cluster, and read the values back. They almost all come back different (and higher) then they are supposed to be.
> Here are some extra things I've noticed:
>  - Reading back the values before the restart always produces correct results.
>  - Doing a nodetool flush before killing the cluster greatly improves the results, though sometimes a value will still be incorrect. You might have to run the test several times to see an incorrect value after a flush.
>  - This problem doesn't happen on C* 1.0.7, unless you don't sleep between doing the increments and killing the cluster. Then it sometimes happens to a lesser degree.
> The dtest that demonstrates this issue is called "super_counter_test.py". Run it like this: nosetests --nocapture super_counter_test.py  You'll need ccm from git@github.com:tpatterson/ccm.git.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira