You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2012/07/04 03:13:34 UTC

[jira] [Comment Edited] (CASSANDRA-4285) Atomic, eventually-consistent batches

    [ https://issues.apache.org/jira/browse/CASSANDRA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13406221#comment-13406221 ] 

Jonathan Ellis edited comment on CASSANDRA-4285 at 7/4/12 1:12 AM:
-------------------------------------------------------------------

Here's the data model I'm leaning towards:

{code}
CREATE TABLE batchlog (
  coordinator inet,
  shard       int,
  id          uuid,
  data        blob,
  PRIMARY KEY ((coordinator, shard))
);
{code}

(Using CASSANDRA-4179 syntax for composite-partition-key.)  As discussed in CASSANDRA-1311, this is going to be a very tombstone-heavy CF since the workload looks like

# insert batchlog entry
# replicate batch
# remove batchlog entry

So we're going to want to shard each coordinator's entries to avoid the problems attendant to Very Wide Rows.  Unlike most such workloads, we don't actually need to time-order our entries; since batches are idempotent, replay order won't matter.  Thus, we can just pick a random shard id (in a known range, say 0 to 63) to use for each entry, and on replay we will ready from each shard.

Other notes:
- I think we can cheat in the replication strategy by knowing that part of the partition key is the coordinator address, to avoid replicating to itself
- default RF will be 1; operators can increase if desired
- operators can also disable [local] commitlog on the batchlog CF, if desired
- gcgs can be safely set to zero in all cases; worst that happens is we replay a write a second time which is not a problem
- Currently we always write tombstones to sstables in Memtable flush.  Should add a check for gcgs=0 to do an extra removeDeleted pass, which would make the actual sstable contents for batchlog almost nothing (since the normal, everything-is-working case will be that it gets deleted out while still in the memtable).
                
      was (Author: jbellis):
    {code}
CREATE TABLE batchlog (
  coordinator inet,
  shard       int,
  id          uuid,
  data        blob,
  PRIMARY KEY ((coordinator, shard))
);
{code}

(Using CASSANDRA-4179 syntax for composite-partition-key.)  As discussed in CASSANDRA-1337, this is going to be a very tombstone-heavy CF since the workload looks like

# insert batchlog entry
# replicate batch
# remove batchlog entry

So we're going to want to shard each coordinator's entries to avoid the problems attendant to Very Wide Rows.  Unlike most such workloads, we don't actually need to time-order our entries; since batches are idempotent, replay order won't matter.  Thus, we can just pick a random shard id (in a known range, say 0 to 63) to use for each entry, and on replay we will ready from each shard.

Other notes:
- I think we can cheat in the replication strategy by knowing that part of the partition key is the coordinator address, to avoid replicating to itself
- default RF will be 1; operators can increase if desired
- operators can also disable [local] commitlog on the batchlog CF, if desired
- gcgs can be safely set to zero in all cases; worst that happens is we replay a write a second time which is not a problem
- Currently we always write tombstones to sstables in Memtable flush.  Should add a check for gcgs=0 to do an extra removeDeleted pass, which would make the actual sstable contents for batchlog almost nothing (since the normal, everything-is-working case will be that it gets deleted out while still in the memtable).
                  
> Atomic, eventually-consistent batches
> -------------------------------------
>
>                 Key: CASSANDRA-4285
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4285
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: API, Core
>            Reporter: Jonathan Ellis
>
> I discussed this in the context of triggers (CASSANDRA-1311) but it's useful as a standalone feature as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira