You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Bhuvan Rawal <bh...@gmail.com> on 2016/12/10 22:48:21 UTC

Proposal - Feature to have versioning in Cassandra

Hi Devs!

As this is related to a future improvement in Cassandra I thought this is
the appropriate mailing list.

I was reading a bit on HBase, it provides a facility to version rows with
number of version, etc can be specified in schema.

Although this can be easily achieved in cassandra as well but requires a
bit more involvement of client. Here is how it can be done currently -
having the last clustering column in schema as a timestamp. Each time a row
is being written write with a new timestamp, during read make sure latest
Timestamp column row is read. If number of rows returned is more than
versions, issue a delete call (Could be done while reading / a read before
write during write).

I believe this feature can be natively brought to Cassandra, this is what I
propose:
1. While creating schema it can be specified that versioning is supposed to
be on. If thats the case it should be validated that last clustering column
is a timestamp.
2. Whenever a write is performed we read existing partition and merge the
previous row into current row and insert with current timestamp in
clustering column.
3. If the partition count exceeds version count specified in schema then
issue delete for old version.

All of this can be done locally.

There is another possibility which looks promising, we can have a
materialized view which maintains versioning, this gives benefit to have
versioning possible for existing rows and not having to play around with
base table and possibly corrupt it. Also this will be a local MV (partition
resides locally) so performance implication should be less.

Steps for this could be - from users perspective:
1. Create a MV with versioning on. (That will internally mean that
timestamp clustering column is created in schema).
2. While writing into MV along with insert, read can be performed and if
version count is higher than max, delete can be issued.
3. While Read user can specify number of versions to be returned, cassandra
on reading complete partition can filter out the older versions.

Would like to have this validated before creating a Jira.

Regards,
Bhuvan