You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Brandon Williams <dr...@gmail.com> on 2010/03/18 22:14:40 UTC

Re: Storing lots of data as Columns in a Column Family (ref Twissandra)

On Thu, Mar 18, 2010 at 4:08 PM, Muhammed Nasrullah <na...@gmail.com>wrote:

> Hello folks,
>
> Twissandra <http://twissandra.com/> (Twitter clone example for Cassandra)
> has a public page where every public update/tweet is stored in a column
> family under the key !public! like so:
>
> Userline = {
>     '!public!': {
>         # timestamp of tweet: tweet id
>         1267414247561777: '7561a442-24e2-11df-8924-001ff3591711',
>         1267414277402340: 'f0c8d718-24e2-11df-8924-001ff3591711',
>         1267414305866969: 'f9e6d804-24e2-11df-8924-001ff3591711',
>         1267414319522925: '02ccb5ec-24e3-11df-8924-001ff3591711',
>     },
> }
>
>
> My question is, because this is the public timeline, it will get a lot of
> updates and because this is a single row keyed by '!public!', this won't fit
> in memory eventually. Is there a better way to model this? The problem is
> that the data needs to be retrieved in reverse chronological order,
> something which cannot be done while getting a range of keys without knowing
> the start and finish keys in advance.


The rows could be named and partitioned by date/time, which can be known in
advance.  For example, '!public!20100318' could contain the public timeline
for that day.

-Brandon

Re: Storing lots of data as Columns in a Column Family (ref Twissandra)

Posted by Eric Florenzano <fl...@gmail.com>.
>
> The rows could be named and partitioned by date/time, which can be known in
> advance.  For example, '!public!20100318' could contain the public timeline
> for that day.
>

Yes, I thought of doing this.  Then I realized there'd be boundary cases, on
the start of a new day, where it'd be best to query for both the new day and
the old one.  In the end, since I wanted the project to be simpler to
understand, I decided to keep the simpler approach.

Perhaps I will add a comment noting that this isn't a particularly scalable
solution, and suggest the day barrier.

Also if my understanding is correct, CASSANDRA-674 will help alleviate
(although it won't solve) the problem.

-Eric