You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "T Jake Luciani (JIRA)" <ji...@apache.org> on 2016/06/22 14:46:58 UTC

[jira] [Comment Edited] (CASSANDRA-9779) Append-only optimization

    [ https://issues.apache.org/jira/browse/CASSANDRA-9779?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344405#comment-15344405 ] 

T Jake Luciani edited comment on CASSANDRA-9779 at 6/22/16 2:46 PM:
--------------------------------------------------------------------

bq.  if you violate the INSERTS ONLY contract by updating existing rows, Cassandra will give you one of those versions back when you query it, but not necessarily the most recent.

It sounds like you are saying there are no guarantees.   

I've given this some thought and I think the best approach in which we can syntactically "do something" is to combine this ticket with the idea [~thobbs] touched on in CASSANDRA-9928. This might be what you are describing we should do but I'll just restate it.

bq. One possible solution is to require that all non-PK columns that are in a view PK be updated simultaneously. T Jake Luciani mentioned possible problems from read repair, but it seems like, with this restriction in place, any read repairs would end up repairing all non-PK columns at once.

Basically, this would add a mode where we INSERT *all* columns every time.  While this sounds restrictive, it also forces the user to deal with the fact that making updates conceptually/logistically hard since we would kick out all client mutations that don't specify all columns.  Sure you could subvert this but to me at least, the server can alert the user that updating existing data as in other tables is hard.

So the proposal is:

  * Add a table level flag/syntax to mark that a table is INSERT ONLY (which can be altered if there's an emergency).
  * Reject any INSERTS/UPSERTS that do not specify all columns
  * Possibly always return the earliest row if there is a conflict.
  * When writing to the memtable we can add a putIfAbsent method to reject/ignore updates (to cover some minimal bases) 


was (Author: tjake):
.bq  if you violate the INSERTS ONLY contract by updating existing rows, Cassandra will give you one of those versions back when you query it, but not necessarily the most recent.

It sounds like you are saying there are no guarantees.   

I've given this some thought and I think the best approach in which we can syntactically "do something" is to combine this ticket with the idea [~thobbs] touched on in CASSANDRA-9928. This might be what you are describing we should do but I'll just restate it.

bq. One possible solution is to require that all non-PK columns that are in a view PK be updated simultaneously. T Jake Luciani mentioned possible problems from read repair, but it seems like, with this restriction in place, any read repairs would end up repairing all non-PK columns at once.

Basically, this would add a mode where we INSERT *all* columns every time.  While this sounds restrictive, it also forces the user to deal with the fact that making updates conceptually/logistically hard since we would kick out all client mutations that don't specify all columns.  Sure you could subvert this but to me at least, the server can alert the user that updating existing data as in other tables is hard.

So the proposal is:

  * Add a table level flag/syntax to mark that a table is INSERT ONLY (which can be altered if there's an emergency).
  * Reject any INSERTS/UPSERTS that do not specify all columns
  * Possibly always return the earliest row if there is a conflict.
  * When writing to the memtable we can add a putIfAbsent method to reject/ignore updates (to cover some minimal bases) 

> Append-only optimization
> ------------------------
>
>                 Key: CASSANDRA-9779
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9779
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: Jonathan Ellis
>             Fix For: 3.x
>
>
> Many common workloads are append-only: that is, they insert new rows but do not update existing ones.  However, Cassandra has no way to infer this and so it must treat all tables as if they may experience updates in the future.
> If we added syntax to tell Cassandra about this ({{WITH INSERTS ONLY}} for instance) then we could do a number of optimizations:
> - Compaction would only need to worry about defragmenting partitions, not rows.  We could default to DTCS or similar.
> - CollationController could stop scanning sstables as soon as it finds a matching row
> - Most importantly, materialized views wouldn't need to worry about deleting prior values, which would eliminate the majority of the MV overhead



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)