You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Pavol Slamka (JIRA)" <ji...@apache.org> on 2014/12/29 11:56:13 UTC

[jira] [Created] (CASSANDRA-8543) Allow custom code to control behavior of reading and compaction

Pavol Slamka created CASSANDRA-8543:
---------------------------------------

             Summary: Allow custom code to control behavior of reading and compaction
                 Key: CASSANDRA-8543
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8543
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Pavol Slamka
            Priority: Minor


When storing series data in blob objects because of speed improvements, it is sometimes neccessary to change only few values of a single blob (say few integers out of 1024 integers). Right now one could rewrite these using compare and set and versioning - read blob and version, change few values, write whole updated blob and incremented version if version did not change, repeat the whole process otherwise (optimistic approach). However compare and set brings some overhead. Let's try to leave out compare and set, and instead reading and updating, let's write only "blank" blob with only few values set. Blank blob contains special blank placeholder data such as NULL or max value of int or similar. Since this write in fact only appends new SStable record, we did not overwrite the old data yet. That happens during read or compaction. But if we provided custom read, and custom compaction, which would not replace the blob with a new "sparse blank" blob, but rather would replace values in first blob (first sstable record) with only "non blank" values from second blob (second sstable record), we would achieve fast partial blob update without compare and set on a last write wins basis. Is such approach feasible? Would it be possible to customize Cassandra so that custom code for compaction and data reading could be provided for a column (blob)? 
There may be other better solutions, but speedwise, this seems best to me. Sorry for any mistakes, I am new to Cassandra.

Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)