You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Haebin Na (JIRA)" <ji...@apache.org> on 2014/04/30 04:34:20 UTC

[jira] [Created] (CASSANDRA-7115) Partitioned Column Family (Table) based on Column Keys (Sorta TTLed Table)

Haebin Na created CASSANDRA-7115:
------------------------------------

             Summary: Partitioned Column Family (Table) based on Column Keys (Sorta TTLed Table)
                 Key: CASSANDRA-7115
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7115
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Haebin Na
            Priority: Minor


We need a better solution to expire columns than TTLed columns.

If you set TTL 6 months for a column in a frequently updated(deleted, yes, this is anti-pattern) wide row, it is not likely to be deleted since the row would be highly fragmented.

In order to solve the problem above, I suggest partitioning column family (table) with column key (column1) as partition key.

It is like a set of column families (tables) which share the same structure and cover certain range of columns per CF. This means that a row is deterministically fragmented by column key.

If you use timestamp like column key, then you would be able to truncate specific partition (a sub-table or CF with specific range) if it is older than certain age easily without worrying about zombie tombstones. 

It is not optimal to have many column families, yet even with small set like by biyearly or quarterly, we could achieve whole lot more efficient than TTLed columns.

What do you think?






--
This message was sent by Atlassian JIRA
(v6.2#6252)