You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/08/15 06:28:52 UTC

Could table partitioning be implemented using a customer compaction strategy?

We use log structured tables to hold logs for analysis.

It's basically append only, and immutable.  Every record has a timestamp
for each record inserted.

Having this in ONE big monolithic table can be problematic.

1.  compactions have to compact old data that might not even be used often.

2.  it might be nice to not have the old data touched on disk so that your
can just use it for map reduce.  Being able to fadvise away old data so
that it's not in cache can be valuable.

3.  the ability to drop large chunks of old data is also useful .  For
example, if you run out of disk space, you can just drop the oldest day's
worth of data without having to use tombstones.

MySQL has a somewhat decent partition engine:

http://dev.mysql.com/doc/refman/5.1/en/partitioning.html

It seems like this come be easily implemented using a custom compaction
strategy.

Essentially, you would take each SSTable and first group them into
partitions.  So if you were using day partitions, you could take all
SSTables for that day, and then use another , nested compaction strategy,
like leveled, on just those SSTables.

The older days would yield one SSTable per day, once all the individual
SSTables are compacted.   For a month, you would need a minimum of 30
SSTables.

You would need to implement some custom ways to prune the older partitions.
  And you'd also need some way to define the partitions.

… but maybe an initial prototype could just read from a configuration file,
or another system table which defines them…

(just thinking out loud)

Kevin

-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
blog: http://burtonator.wordpress.com
… or check out my Google+ profile
<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>

Re: Could table partitioning be implemented using a customer compaction strategy?

Posted by DuyHai Doan <do...@gmail.com>.
Check that: https://issues.apache.org/jira/browse/CASSANDRA-6602

 There is a patch for a compaction strategy dedicated to time series data.
The discussion is also interesting in the comments.



On Fri, Aug 15, 2014 at 6:28 AM, Kevin Burton <bu...@spinn3r.com> wrote:

> We use log structured tables to hold logs for analysis.
>
> It's basically append only, and immutable.  Every record has a timestamp
> for each record inserted.
>
> Having this in ONE big monolithic table can be problematic.
>
> 1.  compactions have to compact old data that might not even be used often.
>
> 2.  it might be nice to not have the old data touched on disk so that your
> can just use it for map reduce.  Being able to fadvise away old data so
> that it's not in cache can be valuable.
>
> 3.  the ability to drop large chunks of old data is also useful .  For
> example, if you run out of disk space, you can just drop the oldest day's
> worth of data without having to use tombstones.
>
> MySQL has a somewhat decent partition engine:
>
> http://dev.mysql.com/doc/refman/5.1/en/partitioning.html
>
> It seems like this come be easily implemented using a custom compaction
> strategy.
>
> Essentially, you would take each SSTable and first group them into
> partitions.  So if you were using day partitions, you could take all
> SSTables for that day, and then use another , nested compaction strategy,
> like leveled, on just those SSTables.
>
> The older days would yield one SSTable per day, once all the individual
> SSTables are compacted.   For a month, you would need a minimum of 30
> SSTables.
>
> You would need to implement some custom ways to prune the older
> partitions.   And you'd also need some way to define the partitions.
>
> … but maybe an initial prototype could just read from a configuration
> file, or another system table which defines them…
>
> (just thinking out loud)
>
> Kevin
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> <https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
>
>