You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Raj Bakhru <rb...@gmail.com> on 2011/02/05 23:28:16 UTC

revisioned data

Hi all -

We're new to Cassandra and have read plenty on the data model, but we wanted
to poll for thoughts on how to best handle this structure.

We have simple objects that have and ID and we want to maintain a history of
all the revisions.

e.g.
MyObject:
    ID (long)
    name
    other fields
    update time (long [date])


Any time the object changes, we'll store down a new version of the object
(same ID, but different update time and other fields).  We need to be able
to query out what the object was as-of any time historically.  We also need
to be able to query out what some or all of the items of this object type
were as-of any time historically..

In SQL, we'd just find the max(id) where update time < queried_as_of_time

In Cassandra, we were thinking of modeling as follows:

CF:  MyObjectType
Super-Column: ID of object (e.g. 625)
Column:  updatetime  (e.g. "1000245242")
Value: byte[] of serialized object

We were thinking of using the OrderingPartitioner and using range queries
against the data.

Does this make sense?  Are we approaching this in the wrong way?

Thanks a lot

Re: revisioned data

Posted by Jonathan Ellis <jb...@gmail.com>.
Using supercolumns to contain versions is reasonable, as long as the
number of versions is not too large.

On Sat, Feb 5, 2011 at 4:38 PM, Victor Kabdebon
<vi...@gmail.com> wrote:
> Hello Raj,
>
> No it actually doesn't make sense from the point of view of Cassandra;
> OrderingPartioner preserves the order of the keys. The Ordering will be done
> according to the supercolumn name. In that case you can set the ordering
> with compare_super_with (sorry I don't remember exactly the new term in
> Cassandra, but that's the idea). The compare_with will order your columns
> inside your supercolumn.
>
> However, and I think that many will agree here, tend to avoid SuperColumn.
> Rather than using SuperColumns try to think like that :
>
> CF1 : "ObjectStore"
> Key :ID (long)
> Columns : {
>     name
>     other fields
>     update time (long [date])
>     ...}
>
> CF2 : "ObjectOrder"
> Key : "myorderedobjects
> Column:{
>    { name : identifier that can be sorted
>    value :ObjectID},
>    ...
> }
>
> Best regards,
> Victor Kabdebon,
> http://www.voxnucleus.fr
>
> 2011/2/5 Raj Bakhru <rb...@gmail.com>
>>
>> Hi all -
>>
>> We're new to Cassandra and have read plenty on the data model, but we
>> wanted to poll for thoughts on how to best handle this structure.
>>
>> We have simple objects that have and ID and we want to maintain a history
>> of all the revisions.
>>
>> e.g.
>> MyObject:
>>     ID (long)
>>     name
>>     other fields
>>     update time (long [date])
>>
>>
>> Any time the object changes, we'll store down a new version of the object
>> (same ID, but different update time and other fields).  We need to be able
>> to query out what the object was as-of any time historically.  We also need
>> to be able to query out what some or all of the items of this object type
>> were as-of any time historically..
>>
>> In SQL, we'd just find the max(id) where update time < queried_as_of_time
>>
>> In Cassandra, we were thinking of modeling as follows:
>>
>> CF:  MyObjectType
>> Super-Column: ID of object (e.g. 625)
>> Column:  updatetime  (e.g. "1000245242")
>> Value: byte[] of serialized object
>>
>> We were thinking of using the OrderingPartitioner and using range queries
>> against the data.
>>
>> Does this make sense?  Are we approaching this in the wrong way?
>>
>> Thanks a lot
>>
>>
>>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: revisioned data

Posted by Victor Kabdebon <vi...@gmail.com>.
Hello Raj,

No it actually doesn't make sense from the point of view of Cassandra;
OrderingPartioner preserves the order of the *keys*. The Ordering will be
done according to the *supercolumn name*. In that case you can set the
ordering with compare_super_with (sorry I don't remember exactly the new
term in Cassandra, but that's the idea). The compare_with will order your
columns inside your supercolumn.

However, and I think that many will agree here, tend to avoid SuperColumn.
Rather than using SuperColumns try to think like that :

CF1 : "ObjectStore"
Key :ID (long)
Columns : {
    name
    other fields
    update time (long [date])
    ...}

CF2 : "ObjectOrder"
Key : "myorderedobjects
Column:{
   { name : identifier that can be sorted
   value :ObjectID},
   ...
}

Best regards,
Victor Kabdebon,
http://www.voxnucleus.fr

2011/2/5 Raj Bakhru <rb...@gmail.com>

> Hi all -
>
> We're new to Cassandra and have read plenty on the data model, but we
> wanted to poll for thoughts on how to best handle this structure.
>
> We have simple objects that have and ID and we want to maintain a history
> of all the revisions.
>
> e.g.
> MyObject:
>     ID (long)
>     name
>     other fields
>     update time (long [date])
>
>
> Any time the object changes, we'll store down a new version of the object
> (same ID, but different update time and other fields).  We need to be able
> to query out what the object was as-of any time historically.  We also need
> to be able to query out what some or all of the items of this object type
> were as-of any time historically..
>
> In SQL, we'd just find the max(id) where update time < queried_as_of_time
>
> In Cassandra, we were thinking of modeling as follows:
>
> CF:  MyObjectType
> Super-Column: ID of object (e.g. 625)
> Column:  updatetime  (e.g. "1000245242")
> Value: byte[] of serialized object
>
> We were thinking of using the OrderingPartitioner and using range queries
> against the data.
>
> Does this make sense?  Are we approaching this in the wrong way?
>
> Thanks a lot
>
>
>
>