You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Ahmet AKYOL (Issue Comment Edited) (JIRA)" <ji...@apache.org> on 2012/03/05 16:39:56 UTC

[jira] [Issue Comment Edited] (CASSANDRA-3999) Column families for "most recent data", (a.k.a. size-safe wide rows)

    [ https://issues.apache.org/jira/browse/CASSANDRA-3999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222403#comment-13222403 ] 

Ahmet AKYOL edited comment on CASSANDRA-3999 at 3/5/12 3:38 PM:
----------------------------------------------------------------

OK, it's exactly Cassandra-3929 and then I'll add a comment there.

I asked this as a question on [stackoverflow|http://stackoverflow.com/questions/9546458/column-families-for-most-recent-data-in-cassandra] but there wasn't an answer, then I opened this issue. 

Thanks. 
                
      was (Author: liqusha):
    OK, it's exactly Cassandra-3929. I asked this as a question on [stackoverflow|http://stackoverflow.com/questions/9546458/column-families-for-most-recent-data-in-cassandra] but there wasn't an answer, then I opened this issue. Thanks.
                  
> Column families for "most recent data", (a.k.a. size-safe wide rows)
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-3999
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3999
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Ahmet AKYOL
>
> "Wide row design" is very handy (for time series data) and on the other hand we have to keep each row size around an acceptable amount. Then, we need buckets; right? Monthly, daily or even hourly buckets... The problem with bucket approach is the distribution of data in rows (as always). 
> So, why not to tell cassandra we want a column family like LRU cache but on disk. If we start design from queries we usually end up with "most recent data" queries. This "size safe wide rows" approach can be very useful in many use cases.
> Here are some example hypothetical column family storage parameters :
> max_column_number_hint : 1000 // meaning: try to keep around 1000 columns. Since it's a hint, we(users) are OK with tombstones or 800 - 1200 range
> or
> max_row_size_hint : 1MB
> I don't know "Cassandra Internals" but C* has already background jobs( for compaction,deletion and ttl) and columns already have timestamps. So both from user point of view and C*, it makes sense.
> P.S: Sorry for my poor English and it's my very first "issue" :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira