You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "ivan (JIRA)" <ji...@apache.org> on 2011/09/01 02:09:12 UTC

[jira] [Commented] (CASSANDRA-3070) counter repair

    [ https://issues.apache.org/jira/browse/CASSANDRA-3070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095012#comment-13095012 ] 

ivan commented on CASSANDRA-3070:
---------------------------------

Hi Sylvain,

our sstables contain sensitive information so i can't provide them. Sorry.

I reloaded sstables in our test environment and catched a new ouput log ().
In this new log there is two new debug message:
1. rows containing "CF resolve" string (message printed at the begining of resolve method in src/java/org/apache/cassandra/db/ColumnFamily.java)
2. rows containing "CF addAll" string (message printed at the begining of addAll method in src/java/org/apache/cassandra/db/ColumnFamily.java)

We have a backup of sstables with these counters so I can do any test on them.
We have a 6 node cluster using RF=3.

When we experienced problems with some counters I started to debug this problem.

Using LOCAL_QUORUM CL we get the same answer from all servers but using ONE CL we get a lower number from 2 servers of 6.
The results from the 2 server was lower with 3 than other server.

I found the following:
- server (10.20.255.55) notices when there is a digest mismatch (using LOCAL_QUORUM)
- server (10.20.255.55) sends a repair (rowmutation) message to related servers
- server (10.20.255.53) receives this mutation (which contains the same total() received by client)
- when mutation is handled by Memtable.put() ColumnFamily.resolve() produces a different result
  (data contained in Memtable have a delta and the right counter value is not applied instead of this deltha)

I don't know the resolved value is correct or not (I suspect it's not beacuse total() value seems to be wrong), because I don't know in details how counter works.



Regards,
ivan


> counter repair
> --------------
>
>                 Key: CASSANDRA-3070
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3070
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.4
>            Reporter: ivan
>            Assignee: Sylvain Lebresne
>         Attachments: counter_local_quroum_maybeschedulerepairs.txt, counter_local_quroum_maybeschedulerepairs_2.txt
>
>
> Hi!
> We have some counters out of sync but repair doesn't sync values.
> We tried nodetool repair.
> We use LOCAL_QUORUM for read. A repair row mutation is sent to other nodes while reading a bad row but counters wasn't repaired by mutation.
> Output of two nodes were uploaded. (Some new debug messages were added.)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira