You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Willem-Paul Stuurman (JIRA)" <ji...@apache.org> on 2015/05/04 22:23:08 UTC

[jira] [Comment Edited] (CASSANDRA-8141) Versioned rows

    [ https://issues.apache.org/jira/browse/CASSANDRA-8141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14527150#comment-14527150 ] 

Willem-Paul Stuurman edited comment on CASSANDRA-8141 at 5/4/15 8:22 PM:
-------------------------------------------------------------------------

Cassandra like lots of data and therefor has all the makings to create a system where previous versions of data are not deleted but marked with 'valid from' and 'valid to' dates (our very own time machine).

Problem is to get all records valid at a certain date 'd': 
'... WHERE validFrom < d AND validTo > d;' won't work.

We solved this in our environment by taking a field that can hold two values: in our case a decimal field where the integer part is the 'validTo' and the remainder is the 'validFrom'. 

Example with dates only to get the idea:

CREATE TABLE users (
	userId text, 
	validFor 'com.knollenstein.apps.cassandra.custom.KnPeriodType', 
	lastName text, 
PRIMARY KEY (userId, validFor)
) WITH CLUSTERING ORDER BY (validFor DESC);
// Add some data
UPDATE users SET lastName='Stuurma' WHERE userId='wp@test.com' AND validFor=20150401.20150301;
UPDATE users SET lastName='Stuurman' WHERE userId='wp@test.com' AND validFor=99999999.20150401;

To get the latest version:
SELECT lastName FROM users WHERE userId='wp@test.com' AND validFor=99999999;

To get the version valid on 2015-03-23:
SELECT lastName FROM users WHERE userId='wp@test.com' AND validFor=20150323;

Our custom comparator does the normal decimal comparison if it is comparing two date ranges (when data is inserted/updated). So indexing/sorting is the same as any other decimal.

But the magic happens when a select is done: the decimal column containing a date range is compared with a single date value from the where clause (a decimal without a remainder). The custom comparator returns '0' (=equal) if this single date is in range of the date range value (the decimal column value containing two dates).

We can now select all the records valid at a certain date/time or just the latest version.

Now, I'm sure there is a much more elegant way to solve this (without a custom comparator), but maybe this can help to getting version functionality in Cassandra.


was (Author: wpstuurman):
Cassandra like lots of data and therefor has all the makings to create a system where previous versions of data are not deleted but marked with 'valid from' and 'valid to' dates (our very own time machine).

Problem is to get all records valid at a certain date 'd': 
'... WHERE validFrom < d AND validTo > d;' won't work.

We solved this in our environment by taking a field that can hold two values: in our case a decimal field where the integer part is the 'validTo' and the remainder is the 'validFrom'. 

Example with dates only to get the idea:

CREATE TABLE users (
	userId text, 
	validFor 'com.knollenstein.apps.cassandra.custom.KnPeriodType', 
	lastName text, 
PRIMARY KEY (userId, validFor)
) WITH CLUSTERING ORDER BY (validFor DESC);
// Add some data
UPDATE users SET lastName='Stuurma' WHERE userId='wp@test.com' AND validFor=20150401.20150301;
UPDATE users SET lastName='Stuurman' WHERE userId='wp@test.com' AND validFor=99999999.20150401;

To get the latest version:
SELECT lastName FROM users WHERE userId='wp@test.com' AND validFor=99999999;

To get the version valid on 2015-03-23:
SELECT lastName FROM users WHERE userId='wp@test.com' AND validFor=20150323;

Our custom comparator does the normal decimal comparison if it is comparing two date ranges (when data is inserted/updated). So indexing/sorting is the same as any other decimal.

But the magic happens when a select is done: the decimal column containing a date range is compared with a single date value from the where clause (a decimal without a remainder). The custom comparator returns '0' (=equal) if this single date is in range of the date range value (the decimal column value containing two dates).

We can now select all the records valid at a certain date/time or just the latest version.

Now, I'm sure there is a must more elegant way to solve this (without a custom comparator), but maybe this can help to getting version functionality in Cassandra.

> Versioned rows
> --------------
>
>                 Key: CASSANDRA-8141
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8141
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Robert Stupp
>
> People still talk about "global locks" and "distributed transactions". I think that introducing such things is both painful to implement and dangerous for a distributed application.
> But it could be manageable to introduce "versioned rows".
> By "versioned rows" I mean to issue a SELECT against data that was valid at a specified timestamp - something like {{SELECT ... WITH READTIME=1413724696473}}.
> In combination with something like {{UPDATE ... IF NOT MODIFIED SINCE 1413724696473}} it could be powerful. (Sure, this one could be already be achieved by the application today.) 
> It's just an idea I'd like to discuss.
> We already have such a thing like "versioned rows" implicitly since we have the "old" data in the SSTables. Beside that it could be necessary to:
> * don't throw away old columns/rows for some configurable timespan
> * extend the row cache to optionally maintain "old" data
> * (surely something more)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)