You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Kevin Burton <bu...@spinn3r.com> on 2014/05/07 02:55:48 UTC

Storing log structured data in Cassandra without compactions for performance boost.

I'm looking at storing log data in Cassandra…

Every record is a unique timestamp for the key, and then the log line for
the value.

I think it would be best to just disable compactions.

- there will never be any deletes.

- all the data will be accessed in time range (probably partitioned
randomly) and sequentially.

So every time a memtable flushes, we will just keep that SSTable forever.

Compacting the data is kind of redundant in this situation.

I was thinking the best strategy is to use setcompactionthreshold and set
the value VERY high to compactions are never triggered.

Also, It would be IDEAL to be able to tell cassandra to just drop a full
SSTable so that I can truncate older data without having to do a major
compaction and without having to mark everything with a tombstone.  Is this
possible?



-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profile<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Storing log structured data in Cassandra without compactions for performance boost.

Posted by Kevin Burton <bu...@spinn3r.com>.
>
>
> If the data is read from a slice of a partition that has been added over
> time there will be a part of that row in every almost sstable. That would
> mean all of them (multiple disk seeks depending on clustering order per
> sstable) would have to be read from in order to service the query.  Data
> model can help or hurt a lot though.
>
>
Yes… totally agree, but we wouldn't do that.  The entire 'row' is immutable
and passes through the system and then expires due to TTL.

TTL is probably the way to go here, especially if Cassandra just drops the
whole SSTable on the TTL expiration which is what I think I"m hearing.


> If you set the TTL for the columns you added then C* will clean up
> sstables (if size tiered and post 1.2) once the datas been expired.  Since
> you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt
> result in tombstones.
>
>
Thanks!

Kevin
-- 

Founder/CEO Spinn3r.com
Location: *San Francisco, CA*
Skype: *burtonator*
blog: http://burtonator.wordpress.com
… or check out my Google+
profile<https://plus.google.com/102718274791889610666/posts>
<http://spinn3r.com>
War is peace. Freedom is slavery. Ignorance is strength. Corporations are
people.

Re: Storing log structured data in Cassandra without compactions for performance boost.

Posted by Chris Lohfink <cl...@blackbirdit.com>.
Whats your data model look like?

> I think it would be best to just disable compactions.

Why? are you never doing reads?  There is also a cost to repairs/bootstrapping when you have a ton of sstables.  This might be a premature optimization.

If the data is read from a slice of a partition that has been added over time there will be a part of that row in every almost sstable. That would mean all of them (multiple disk seeks depending on clustering order per sstable) would have to be read from in order to service the query.  Data model can help or hurt a lot though.

If you set the TTL for the columns you added then C* will clean up sstables (if size tiered and post 1.2) once the datas been expired.  Since you never delete set the gc_grace_seconds to 0 so the ttl expiration doesnt result in tombstones.

---
Chris Lohfink 



On May 6, 2014, at 7:55 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> I'm looking at storing log data in Cassandra… 
> 
> Every record is a unique timestamp for the key, and then the log line for the value.
> 
> I think it would be best to just disable compactions.
> 
> - there will never be any deletes.
> 
> - all the data will be accessed in time range (probably partitioned randomly) and sequentially.
> 
> So every time a memtable flushes, we will just keep that SSTable forever.  
> 
> Compacting the data is kind of redundant in this situation.
> 
> I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered.
> 
> Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone.  Is this possible?
> 
> 
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
> 


Re: Storing log structured data in Cassandra without compactions for performance boost.

Posted by Aaron Morton <aa...@thelastpickle.com>.
If you disable compaction you will end up with a *lot* of sstables, this will hurt read performance and be a pain to manage (including making repairs and bootstrapping taking longer)

STCS is not too onerous, I’d recommend leaving on. If you want it to run less frequently increase min_threshold. 

Cheers
Aaron

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com

On 8/05/2014, at 8:36 am, DuyHai Doan <do...@gmail.com> wrote:

> Hello Kevin
> 
>  You can disable compaction by configuring the compaction options of your table as follow:
> 
>   compaction={'min_threshold': '0', 'class': 'SizeTieredCompactionStrategy', 'max_threshold': '0'}
> 
> Regards
> 
>  Duy Hai DOAN
> 
> 
> On Wed, May 7, 2014 at 2:55 AM, Kevin Burton <bu...@spinn3r.com> wrote:
> I'm looking at storing log data in Cassandra… 
> 
> Every record is a unique timestamp for the key, and then the log line for the value.
> 
> I think it would be best to just disable compactions.
> 
> - there will never be any deletes.
> 
> - all the data will be accessed in time range (probably partitioned randomly) and sequentially.
> 
> So every time a memtable flushes, we will just keep that SSTable forever.  
> 
> Compacting the data is kind of redundant in this situation.
> 
> I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered.
> 
> Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone.  Is this possible?
> 
> 
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
> 
> 


Re: Storing log structured data in Cassandra without compactions for performance boost.

Posted by DuyHai Doan <do...@gmail.com>.
Hello Kevin

 You can disable compaction by configuring the compaction options of your
table as follow:

  compaction={'min_threshold': '0', 'class':
'SizeTieredCompactionStrategy', 'max_threshold': '0'}

Regards

 Duy Hai DOAN


On Wed, May 7, 2014 at 2:55 AM, Kevin Burton <bu...@spinn3r.com> wrote:

> I'm looking at storing log data in Cassandra…
>
> Every record is a unique timestamp for the key, and then the log line for
> the value.
>
> I think it would be best to just disable compactions.
>
> - there will never be any deletes.
>
> - all the data will be accessed in time range (probably partitioned
> randomly) and sequentially.
>
> So every time a memtable flushes, we will just keep that SSTable forever.
>
> Compacting the data is kind of redundant in this situation.
>
> I was thinking the best strategy is to use setcompactionthreshold and set
> the value VERY high to compactions are never triggered.
>
> Also, It would be IDEAL to be able to tell cassandra to just drop a full
> SSTable so that I can truncate older data without having to do a major
> compaction and without having to mark everything with a tombstone.  Is this
> possible?
>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile<https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>

Re: Storing log structured data in Cassandra without compactions for performance boost.

Posted by Ben Bromhead <be...@instaclustr.com>.
If you make the timestamp the partition key you won't be able to do range queries (unless you use an ordered partitioner).

Assuming you are logging from multiple devices you will want your partition key to be the device id & the date, your clustering key to be the timestamp (timeuuid are good to prevent collisions) and then log message, levels etc as the other columns.

Then you can also create a new table for every week (or day/month depending on how much granularity you want) and just write to the current weeks table. This step allows you to delete old data without Cassandra using tombstones (you just drop the table for the week of logs you want to delete).

For a much clearer explantation see http://www.slideshare.net/patrickmcfadin/cassandra-20-and-timeseries (the last few slides).

As for compaction, I would leave it enabled as having lots of stables hanging around can make range queries slower (the query has more files to visit). See http://stackoverflow.com/questions/8917882/cassandra-sstables-and-compaction (a little old but still relevant). Compaction also fixes up things like merging row fragments (when you write new columns to the same row).


Ben Bromhead
Instaclustr | www.instaclustr.com | @instaclustr | +61 415 936 359


On 07/05/2014, at 10:55 AM, Kevin Burton <bu...@spinn3r.com> wrote:

> I'm looking at storing log data in Cassandra… 
> 
> Every record is a unique timestamp for the key, and then the log line for the value.
> 
> I think it would be best to just disable compactions.
> 
> - there will never be any deletes.
> 
> - all the data will be accessed in time range (probably partitioned randomly) and sequentially.
> 
> So every time a memtable flushes, we will just keep that SSTable forever.  
> 
> Compacting the data is kind of redundant in this situation.
> 
> I was thinking the best strategy is to use setcompactionthreshold and set the value VERY high to compactions are never triggered.
> 
> Also, It would be IDEAL to be able to tell cassandra to just drop a full SSTable so that I can truncate older data without having to do a major compaction and without having to mark everything with a tombstone.  Is this possible?
> 
> 
> 
> -- 
> 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.
> 


Re: Storing log structured data in Cassandra without compactions for performance boost.

Posted by Nate McCall <na...@thelastpickle.com>.
The following article has some good information for what you describe:
http://www.datastax.com/dev/blog/optimizations-around-cold-sstables

Some related tickets which will provide background:
https://issues.apache.org/jira/browse/CASSANDRA-5228
https://issues.apache.org/jira/browse/CASSANDRA-5515


On Tue, May 6, 2014 at 7:55 PM, Kevin Burton <bu...@spinn3r.com> wrote:

> I'm looking at storing log data in Cassandra…
>
> Every record is a unique timestamp for the key, and then the log line for
> the value.
>
> I think it would be best to just disable compactions.
>
> - there will never be any deletes.
>
> - all the data will be accessed in time range (probably partitioned
> randomly) and sequentially.
>
> So every time a memtable flushes, we will just keep that SSTable forever.
>
> Compacting the data is kind of redundant in this situation.
>
> I was thinking the best strategy is to use setcompactionthreshold and set
> the value VERY high to compactions are never triggered.
>
> Also, It would be IDEAL to be able to tell cassandra to just drop a full
> SSTable so that I can truncate older data without having to do a major
> compaction and without having to mark everything with a tombstone.  Is this
> possible?
>
>
>
> --
>
> Founder/CEO Spinn3r.com
> Location: *San Francisco, CA*
> Skype: *burtonator*
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile<https://plus.google.com/102718274791889610666/posts>
> <http://spinn3r.com>
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are
> people.
>
>


-- 
-----------------
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com