You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/02/03 11:52:29 UTC

[jira] Created: (CASSANDRA-2105) Fix the read race condition in CFStore for counters

Fix the read race condition in CFStore for counters 
----------------------------------------------------

                 Key: CASSANDRA-2105
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8
            Reporter: Sylvain Lebresne
            Assignee: Sylvain Lebresne
             Fix For: 0.8


There is a (known) race condition during counter read. Indeed, for standard
column family there is a small time during which a memtable is both active and
pending flush and similarly a small time during which a 'memtable' is both
pending flush and an active sstable. For counters that would imply sometime
reconciling twice during a read the same counterColumn and thus over-counting.

Current code changes this slightly by trading the possibility to count twice a
given counterColumn by the possibility to miss a counterColumn. Thus it trades
over-counts for under-counts.

But this is no fix and there is no hope to offer clients any kind of guarantee
on reads unless we fix this.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2105) Fix the read race condition in CFStore for counters

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012081#comment-13012081 ] 

Hudson commented on CASSANDRA-2105:
-----------------------------------

Integrated in Cassandra #810 (See [https://hudson.apache.org/hudson/job/Cassandra/810/])
    Atomically switch cfstore memtables and sstables
patch by slebresne; reviewed by jbellis for CASSANDRA-2284 (and CASSANDRA-2105)


> Fix the read race condition in CFStore for counters 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8
>
>         Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2105) Fix the read race condition in CFStore for counters

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2105:
----------------------------------------

    Attachment: 2115_option2_nolock.patch
                2115_option1_withLock.patch

Attached not 1 but 2 options for this patch. I'm not sure with which version to go so I'm asking for opinions.

Version 1 is the one extracted from #1546. It uses a ReadWriteLock to protect from the race condition.

Version 2 don't use a lock. So less chances of lock contention which is always good. Only problem is, it still suffers in theory of a race condition. But I think this race condition is borderline impossible.
Basically, given a memtable m being flushed, let's call s(m) the sstable initially produced by its flushing and let's denote by s'(m) any sstable resulting of the compaction of s(m). The race is if a read thread sees m when grabbing the references to the memtable being flushed and sees s'(m) (not s(m), that is the initial race condition and this is not impossible at all) when grabing the reference to the sstables.
If it's unclear, the code has a comment explaining this that may be more clear.

So not sure which version to go with. I may slightly lean towards Version 1 because I usually side with correction before anything else, but since this is in a critical path it feels slightly wasteful to use a lock for this given how remote the race condition of version 2 seems.


> Fix the read race condition in CFStore for counters 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8
>
>         Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (CASSANDRA-2105) Fix the read race condition in CFStore for counters

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-2105.
-----------------------------------------

    Resolution: Fixed

Fixed by CASSANDRA-2284

> Fix the read race condition in CFStore for counters 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8
>
>         Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2105) Fix the read race condition in CFStore for counters

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003974#comment-13003974 ] 

Sylvain Lebresne commented on CASSANDRA-2105:
---------------------------------------------

I've opened CASSANDRA-2284 that provides what I think is a better solution than the one I have attached previously to this problem (I've opened it separately because it's a more generic solution, not just a counter related fix).



> Fix the read race condition in CFStore for counters 
> ----------------------------------------------------
>
>                 Key: CASSANDRA-2105
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>              Labels: counters
>             Fix For: 0.8
>
>         Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira