You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/02/03 11:52:29 UTC
[jira] Created: (CASSANDRA-2105) Fix the read race condition in
CFStore for counters
Fix the read race condition in CFStore for counters
----------------------------------------------------
Key: CASSANDRA-2105
URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
Project: Cassandra
Issue Type: Bug
Affects Versions: 0.8
Reporter: Sylvain Lebresne
Assignee: Sylvain Lebresne
Fix For: 0.8
There is a (known) race condition during counter read. Indeed, for standard
column family there is a small time during which a memtable is both active and
pending flush and similarly a small time during which a 'memtable' is both
pending flush and an active sstable. For counters that would imply sometime
reconciling twice during a read the same counterColumn and thus over-counting.
Current code changes this slightly by trading the possibility to count twice a
given counterColumn by the possibility to miss a counterColumn. Thus it trades
over-counts for under-counts.
But this is no fix and there is no hope to offer clients any kind of guarantee
on reads unless we fix this.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-2105) Fix the read race condition in
CFStore for counters
Posted by "Hudson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012081#comment-13012081 ]
Hudson commented on CASSANDRA-2105:
-----------------------------------
Integrated in Cassandra #810 (See [https://hudson.apache.org/hudson/job/Cassandra/810/])
Atomically switch cfstore memtables and sstables
patch by slebresne; reviewed by jbellis for CASSANDRA-2284 (and CASSANDRA-2105)
> Fix the read race condition in CFStore for counters
> ----------------------------------------------------
>
> Key: CASSANDRA-2105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Labels: counters
> Fix For: 0.8
>
> Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (CASSANDRA-2105) Fix the read race condition in
CFStore for counters
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne updated CASSANDRA-2105:
----------------------------------------
Attachment: 2115_option2_nolock.patch
2115_option1_withLock.patch
Attached not 1 but 2 options for this patch. I'm not sure with which version to go so I'm asking for opinions.
Version 1 is the one extracted from #1546. It uses a ReadWriteLock to protect from the race condition.
Version 2 don't use a lock. So less chances of lock contention which is always good. Only problem is, it still suffers in theory of a race condition. But I think this race condition is borderline impossible.
Basically, given a memtable m being flushed, let's call s(m) the sstable initially produced by its flushing and let's denote by s'(m) any sstable resulting of the compaction of s(m). The race is if a read thread sees m when grabbing the references to the memtable being flushed and sees s'(m) (not s(m), that is the initial race condition and this is not impossible at all) when grabing the reference to the sstables.
If it's unclear, the code has a comment explaining this that may be more clear.
So not sure which version to go with. I may slightly lean towards Version 1 because I usually side with correction before anything else, but since this is in a critical path it feels slightly wasteful to use a lock for this given how remote the race condition of version 2 seems.
> Fix the read race condition in CFStore for counters
> ----------------------------------------------------
>
> Key: CASSANDRA-2105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Labels: counters
> Fix For: 0.8
>
> Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (CASSANDRA-2105) Fix the read race condition in
CFStore for counters
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sylvain Lebresne resolved CASSANDRA-2105.
-----------------------------------------
Resolution: Fixed
Fixed by CASSANDRA-2284
> Fix the read race condition in CFStore for counters
> ----------------------------------------------------
>
> Key: CASSANDRA-2105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Labels: counters
> Fix For: 0.8
>
> Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (CASSANDRA-2105) Fix the read race condition in
CFStore for counters
Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/CASSANDRA-2105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13003974#comment-13003974 ]
Sylvain Lebresne commented on CASSANDRA-2105:
---------------------------------------------
I've opened CASSANDRA-2284 that provides what I think is a better solution than the one I have attached previously to this problem (I've opened it separately because it's a more generic solution, not just a counter related fix).
> Fix the read race condition in CFStore for counters
> ----------------------------------------------------
>
> Key: CASSANDRA-2105
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2105
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.8
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Labels: counters
> Fix For: 0.8
>
> Attachments: 2115_option1_withLock.patch, 2115_option2_nolock.patch
>
> Original Estimate: 4h
> Remaining Estimate: 4h
>
> There is a (known) race condition during counter read. Indeed, for standard
> column family there is a small time during which a memtable is both active and
> pending flush and similarly a small time during which a 'memtable' is both
> pending flush and an active sstable. For counters that would imply sometime
> reconciling twice during a read the same counterColumn and thus over-counting.
> Current code changes this slightly by trading the possibility to count twice a
> given counterColumn by the possibility to miss a counterColumn. Thus it trades
> over-counts for under-counts.
> But this is no fix and there is no hope to offer clients any kind of guarantee
> on reads unless we fix this.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira