You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Yang Yang (JIRA)" <ji...@apache.org> on 2011/06/15 08:43:47 UTC

[jira] [Issue Comment Edited] (CASSANDRA-2774) one way to make counter delete work better

    [ https://issues.apache.org/jira/browse/CASSANDRA-2774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049650#comment-13049650 ] 

Yang Yang edited comment on CASSANDRA-2774 at 6/15/11 6:42 AM:
---------------------------------------------------------------

note that the main logic is rather simple, and concentrated in CounterColumn.reconcile()

but most of the coding was done around the issue of setting up a "lastDeleteTimestamp()" for a completely new incoming CounterUpdateColumn, since it does not any history yet. the code in this part uses some rather messy changes, and should definitely use a better route, but so far it's only for demonstration of the idea. people more familiar with the code path can suggest a better way.


here is how it works:
when a counterUpdateColumn comes in, we put it in memtable, which goes through the reconcile process. new code obtains the state of those columns listed in the mutation, AFTER the mutation is applied. we check whether any columns are completely new, i.e. they do not have a matching one in memtable, so that their timestampOfLastDelete() is still "UNDECIDED". then for these columns, we do a read, and find any of their existing columns in SStables, and assign the correct timestampOfLastDelete() to them.  we do not do the read in sstable at first because most counter adds will already have one matching column in memtable, so now we only incur the extra sstable reading cost  when we  start a new counter in memtable.


reconcile rule:
   1)  newer epoch (timestampOfLastDelete() ) wins
   2)  UNDECIDED inherits the timestmapOfLastDelete() of the competitor   (UNDECIDED can be seen ONLY when a new column comes into memtable)
   3)  if timestampOfLastDelete() is same, non-delete wins over delete
   4)  at last, use standard merging between counterColumns





      was (Author: yangyangyyy):
    note that the main logic is rather simple, and concentrated in CounterColumn.reconcile()

but most of the coding was done around the issue of setting up a "lastDeleteTimestamp()" for a completely new incoming CounterUpdateColumn, since it does not any history yet. the code in this part uses some rather messy changes, and should definitely use a better route, but so far it's only for demonstration of the idea. people more familiar with the code path can suggest a better way.


  
> one way to make counter delete work better
> ------------------------------------------
>
>                 Key: CASSANDRA-2774
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2774
>             Project: Cassandra
>          Issue Type: New Feature
>    Affects Versions: 0.8.0
>            Reporter: Yang Yang
>         Attachments: counter_delete.diff
>
>
> current Counter does not work with delete, because different merging order of sstables would produces different result, for example:
> add 1
> delete 
> add 2
> if the merging happens by 1-2, (1,2)--3  order, the result we see will be 2
> if merging is: 1--3, (1,3)--2, the result will be 3.
> the issue is that delete now can not separate out previous adds and adds later than the delete. supposedly a delete is to create a completely new incarnation of the counter, or a new "lifetime", or "epoch". the new approach utilizes the concept of "epoch number", so that each delete bumps up the epoch number. since each write is replicated (replicate on write is almost always enabled in practice, if this is a concern, we could further force ROW in case of delete ), so the epoch number is global to a replica set
> changes are attached, existing tests pass fine, some tests are modified since the semantic is changed a bit. some cql tests do not pass in the original 0.8.0 source, that's not the fault of this change.
> see details at http://mail-archives.apache.org/mod_mbox/cassandra-user/201106.mbox/%3CBANLkTikQcgLSNwtT-9HvqpSeoo7SF58SnA@mail.gmail.com%3E
> the goal of this is to make delete work ( at least with consistent behavior, yes in case of long network partition, the behavior is not ideal, but it's consistent with the definition of logical clock), so that we could have expiring Counters

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira