You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by zhen ye <ye...@gmail.com> on 2010/12/01 13:23:04 UTC

Is there any way to store muti-version data based on the timestamp?

Hi, dear Guys

I did some test to see if cassandra can store muti-version of the same
data, but from the below test code seems it only can store one
version's data, which is different from HBase.

Can somebody help to confirm this?
It will be very appreciative if some one are kindly enough to give me
a suggestion of how to use cassandra to store muti-version data
efficiently.

client.insert(keyspace, key1, path, "value1".getBytes(), timestamp1,
ConsistencyLevel.ALL);
client.insert(keyspace, key1, path, "value2".getBytes(), timestamp2,
ConsistencyLevel.ALL);

client.remove(keyspace, key1, path, timestamp2, ConsistencyLevel.ALL);

ColumnOrSuperColumn column = client.get(keyspace, key1, path,
ConsistencyLevel.ALL);
System.out.println(new String(column.column.value));

The result is:
NotFoundException()
	at org.apache.cassandra.thrift.Cassandra$get_result.read(Cassandra.java:3639)
	at org.apache.cassandra.thrift.Cassandra$Client.recv_get(Cassandra.java:344)
	at org.apache.cassandra.thrift.Cassandra$Client.get(Cassandra.java:319)
	at ThriftHelloWorld.main(ThriftHelloWorld.java:52)

Re: Is there any way to store muti-version data based on the timestamp?

Posted by Ed Anuff <ed...@anuff.com>.

If you go this route, be sure to take a look at the custom column comparator
I wrote to make this sort of thing easier:

https://github.com/edanuff/CassandraCompositeType

On Wed, Dec 1, 2010 at 4:56 AM, Daniel Lundin <dl...@eintr.org> wrote:

> You could also use a standard column family, composing the version
> into the column name:
>
>  foo => { bar:v1 => data, bar:v2 => data, bar:v3 => data }
>
> Here, there's a cost on retrieval of course, which may or may not work
> depending on your access pattern. If you do large slices, it's
> probably not an option. It could be feasible to write a custom
> comparator sorting on some version component, to allow efficient
> slicing of the "latest" versions.
>
>

Re: Is there any way to store muti-version data based on the timestamp?

Posted by Robert Coli <rc...@digg.com>.

On 12/1/10 4:56 AM, Daniel Lundin wrote:
>> Correct. Unlike BigTable and HBase, Cassandra columns don't have a
>> version dimension.
>> Timestamp is used for (crude) conflict resolution, and older versions
>> are always overwritten.
I would be careful with the word "overwritten" here as it obfuscated the 
immutability of SSTables, and conceptual understanding of same is 
important to understanding the actual versioning behavior and how it 
relates to what data has to be read to satisfy queries. :)

=Rob

Re: Is there any way to store muti-version data based on the timestamp?

Posted by Daniel Lundin <dl...@eintr.org>.

> I did some test to see if cassandra can store muti-version of the same
> data, but from the below test code seems it only can store one
> version's data, which is different from HBase.
> Can somebody help to confirm this?

Correct. Unlike BigTable and HBase, Cassandra columns don't have a
version dimension.
Timestamp is used for (crude) conflict resolution, and older versions
are always overwritten.

> It will be very appreciative if some one are kindly enough to give me
> a suggestion of how to use cassandra to store muti-version data
> efficiently.

One way is using supercolumns with subcolumns as versions:

 foo => { bar => {v1: data, v2: data, v3: data} ... }

You could also use a standard column family, composing the version
into the column name:

 foo => { bar:v1 => data, bar:v2 => data, bar:v3 => data }

Here, there's a cost on retrieval of course, which may or may not work
depending on your access pattern. If you do large slices, it's
probably not an option. It could be feasible to write a custom
comparator sorting on some version component, to allow efficient
slicing of the "latest" versions.

But first, reach for supercolumns.

/d