You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Pierre-Yves Ritschard <py...@spootnik.org> on 2013/01/03 11:12:18 UTC

Re: Last Modified Time Series in cassandra

You can use an approach with two CFs

The first one would be

ExpiredCF
| -- File = Key
     |-- Reversed(TimeUUID) (representing) last change

In this CF, each entry is expired (after a day, an hour, whatever)

EventualCF
|-- File = Key
  |-- String


Storing a file update for a key 'K' at a time 'T' could then be:
ExpiredCF[K][T] = null
EventualCF[K]['last-updated'] = T

You can then query the first column name of ExpiredCF[K] and if it doesn't
exist (it expired) resort to EventualCF[K]['last-updated'].

The advantage of this approach is that ExpiredCF will auto cleanup, which
if you have lots of updates can help you shorten repair times.



On Mon, Dec 24, 2012 at 7:42 AM, Ravikumar Govindarajan <
ravikumar.govindarajan@gmail.com> wrote:

> I can append timeuuid on every update and resolve conflicts on read to
> support time series data on last-modified-time.
>
> Ex:
> -- ExampleCF
>        | -- SomeKey = Key
>             | -- TimeUUIDNew = Column-Name
>             | -- PKID = Column-Value
>             ......
>             | -- TimeUUIDOld = Column-Name
>             | -- PKID = Column-Value
>
> But how to GC TimeUUIDOld?
>
> One solution is to book-keep this info in a CleanupCF to periodically
> sweep all old data.
>
> Ex:
>
> -- ExampleCF
>        | -- SomeKey = Key
>             | -- TimeUUID = Column-Name
>             | -- PKID = Column-Value
>
> -- ExampleReverseIndexCF
>        | -- <SomeKey> = Key
>             | -- PKID+TimeUUID = Composite Column-Name
>             | -- Null = Column-Value
>
> -- CleanupCF
>        | -- Schedule-Cleanup-Task-Id = Key
>             | -- <Some-Key>+PKID = Composite Column-Name
>             | -- Null = Column-Value
>
> Will this approach work? Are there other elegant solutions for this
> problem of maintaining time-series data for last-modified-time?
>
> --
> Ravi
>
>
>
> On Fri, Dec 21, 2012 at 11:07 PM, Andrey Ilinykh <ai...@gmail.com>wrote:
>
>> You can select a column slice (specify time range wich for sure has last
>> data), but ask cassandra to return only one column. It is latest one. To
>> have the best performance use reversed sorting order.
>>
>> Andrey
>>
>>
>> On Fri, Dec 21, 2012 at 6:40 AM, Ravikumar Govindarajan <
>> ravikumar.govindarajan@gmail.com> wrote:
>>
>>> How do we model a timeseries data in cassandra for last modified time?
>>>
>>> -- ExampleCF
>>>        | -- SomeKey = Key
>>>             | -- TimeUUID = Column-Name
>>>             | -- PKID = Column-Value
>>>
>>> -- ExampleReverseIndexCF
>>>        | -- SomeKey = Key
>>>             | -- PKID = Column-Name
>>>             | -- TimeUUID = Column-Value
>>>
>>> To correctly reflect "last-modified-time", I need to read existing
>>> timeuuid, delete it and add incoming timeuuid
>>>
>>> Are there alternatives to the above approach, because it looks a bit
>>> heavy-weight
>>>
>>> --
>>> Ravi
>>>
>>
>>
>