You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Haebin Na (JIRA)" <ji...@apache.org> on 2014/04/30 04:38:15 UTC

[jira] [Updated] (CASSANDRA-7115) Column Family (Table) partitioning with column keys as partition keys (Sorta TTLed Table)

     [ https://issues.apache.org/jira/browse/CASSANDRA-7115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Haebin Na updated CASSANDRA-7115:
---------------------------------

    Description: 
We need a better solution to expire columns than TTLed columns.

If you set TTL 6 months for a column in a frequently updated(deleted, yes, this is anti-pattern) wide row, it is not likely to be deleted since the row would be highly fragmented.

In order to solve the problem above, I suggest partitioning column family (table) with column key (column1) as partition key.

It is like a set of column families (tables) which share the same structure and cover certain range of columns per CF. This means that a row is deterministically fragmented by column key.

If you use timestamp like column key, then you would be able to truncate specific partition (a sub-table or CF with specific range) if it is older than certain age easily without worrying about zombie tombstones. 

It is not optimal to have many column families, yet even with small set like by biyearly or quarterly, it could be whole lot more efficient than TTLed columns.

What do you think?




  was:
We need a better solution to expire columns than TTLed columns.

If you set TTL 6 months for a column in a frequently updated(deleted, yes, this is anti-pattern) wide row, it is not likely to be deleted since the row would be highly fragmented.

In order to solve the problem above, I suggest partitioning column family (table) with column key (column1) as partition key.

It is like a set of column families (tables) which share the same structure and cover certain range of columns per CF. This means that a row is deterministically fragmented by column key.

If you use timestamp like column key, then you would be able to truncate specific partition (a sub-table or CF with specific range) if it is older than certain age easily without worrying about zombie tombstones. 

It is not optimal to have many column families, yet even with small set like by biyearly or quarterly, we could achieve whole lot more efficient than TTLed columns.

What do you think?




        Summary: Column Family (Table) partitioning with column keys as partition keys (Sorta TTLed Table)  (was: Partitioned Column Family (Table) based on Column Keys (Sorta TTLed Table))

> Column Family (Table) partitioning with column keys as partition keys (Sorta TTLed Table)
> -----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7115
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7115
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Haebin Na
>            Priority: Minor
>              Labels: features
>
> We need a better solution to expire columns than TTLed columns.
> If you set TTL 6 months for a column in a frequently updated(deleted, yes, this is anti-pattern) wide row, it is not likely to be deleted since the row would be highly fragmented.
> In order to solve the problem above, I suggest partitioning column family (table) with column key (column1) as partition key.
> It is like a set of column families (tables) which share the same structure and cover certain range of columns per CF. This means that a row is deterministically fragmented by column key.
> If you use timestamp like column key, then you would be able to truncate specific partition (a sub-table or CF with specific range) if it is older than certain age easily without worrying about zombie tombstones. 
> It is not optimal to have many column families, yet even with small set like by biyearly or quarterly, it could be whole lot more efficient than TTLed columns.
> What do you think?



--
This message was sent by Atlassian JIRA
(v6.2#6252)