You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "Steinmaurer, Thomas" <th...@dynatrace.com> on 2017/08/09 08:41:56 UTC

Questions on time series use case, tombstones, TWCS

Hello,

our top contributor from a data volume perspective is time series data. We are running with STCS since our initial production deployment in 2014 with several clusters with a varying number of nodes, but currently with max. 9 nodes per single cluster per different region in AWS with m4.xlarge / EBS gp2 storage. We have a road of Cassandra versions starting with 1.2 to actually using DSC 2.1.15 soon to be replaced by Apache Cassandra 2.1.18 across all deployments. Lately we switched from Thrift (Astyanax) to Native/CQL (DataStax driver). Overall we are pretty happy with stability and the scale out offering.

We store time series data in different resolutions, from 1min up to 1day aggregates per "time slot". Each resolution has its own column family / table and a periodic worker is executing our business logic regarding time series aging from e.g. 1min => 5min => ... resolution + deletion in source resolutions according to our retention per resolution policy. So deletions will happen way later (e.g. at least > 14d). We don't use TTLs on written time series data (in production, see TWCS testing below), so purging is exclusively handled by explicit DELETEs in our aging business logic creating tombstones.

Naturally with STCS and late explicit deletions / tombstones, it will take a lot of time to finally reclaim disk space, even worse, we are now running a major compaction every X weeks. We currently are also testing with STCS min_threshold = 2 etc., but all in all, this all feels not being a long-term solution. I guess there is nothing else we are missing from a configuration/setting side with STCS? Single SSTable compaction might not kick in as well, cause checking with sstablemeta, estimated droppable tombstones for our time series based SSTables is pretty much 0.0 all the time. I guess as we don't write with TTL?

TWCS caught my eyes in 2015 I think, and even more at the Cassandra Summit 2016 + other Tombstone related talks. Cassandra 3.0 is around 6 months ahead for us, thus initial testing was with 2.1.18 patched with TWCS from GitHub.

Looks like TWCS is exactly what we need, thus test-wise, once we start writing with TTL we end up with a single SSTable per passed window size and data (SSTables) older than TTL + grace get automatically removed from disk. Even with enabled out-of-orders DELETEs from our business logic, purging SSTables seems not be stucked. Not sure if this is expected. Writing with TTL is also a bit problematic, in case our retention policy changes in general or for particular customers.

A few questions, as we need some short-term (with C* 2.1) and long-term (with C* 3.0) mitigation:

* With STCS, estimated droppable tombstones being always 0.0 (thus also no automatic single SSTable compaction may happen): Is this a matter of not writing with TTL? If yes, would enabling TTL with STCS improve the disk reclaim situation, cause then single SSTAble compactions will kick in?

* What is the semantic of "default_time_to_live" at table level? From: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html : "After the default_time_to_live TTL value has been exceed, Cassandra tombstones the entire table". What does "entire table" mean? Hopefully / I guess I don't end up with an empty table every X past TTLs?

* Anything else I'm missing regarding STCS and reclaiming disk space earlier in our TS use case?

* I know, changing compaction is a matter of executing ALTER TABLE (or temporary via JMX for a single node), but as we have legacy data being written without TTL, I wonder if we may end up in stuck SSTable again

* In case of stuck SSTables with any compaction strategy, what is the best way to debug/analyze why it got stuck (overlapping etc.)?

Thanks a lot and sorry for the lengthy email.

Thomas
The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freist?dterstra?e 313

Re: Questions on time series use case, tombstones, TWCS

Posted by kurt greaves <ku...@instaclustr.com>.

>
> With STCS, estimated droppable tombstones being always 0.0 (thus also no
> automatic single SSTable compaction may happen): Is this a matter of not
> writing with TTL? If yes, would enabling TTL with STCS improve the disk
> reclaim situation, cause then single SSTAble compactions will kick in?

Estimated droppable tombstones will increase if you have tombstones in an
SSTable. It is just an estimate however, but if you are deleting in any way
(TTL or manual) it should increase. Enabling TTL would increase disk
reclaim, but only because you would be deleting more data. Doing so only
makes sense if you actually can TTL your data. Probably don't do that
without fully understanding all the potential impacts to your data. In fact
it's not a great idea with STCS anyway.
Also, single SSTable compactions aren't very good for removing tombstones
anyway so probably shouldn't be relied on. To get rid of a tombstone the
tombstone needs to be compacted with every other SSTable that contains data
the tombstone covers.

What is the semantic of “default_time_to_live” at table level? From:
> http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html :
> “After the default_time_to_live TTL value has been exceed, Cassandra
> tombstones the entire table”. What does “entire table” mean? Hopefully / I
> guess I don’t end up with an empty table every X past TTLs?

I didn't read that page but Cassandra doesn't tombstone the entire table.
If the default TTL is set on the table, then whenever any data inserted
into the table surpasses the TTL it is becomes expired (essentially a
tombstone). You won't end up with an empty table unless you stop inserting
data.

I know, changing compaction is a matter of executing ALTER TABLE (or
> temporary via JMX for a single node), but as we have legacy data being
> written without TTL, I wonder if we may end up in stuck SSTable again

Yes, if you have legacy data you should make sure you delete it before you
change compaction strategy to TWCS. Otherwise you may have SSTables with
live data that never get deleted.

Short on time, sorry I didn't answer all your questions!

Re: Questions on time series use case, tombstones, TWCS

Posted by Erick Ramirez <fl...@gmail.com>.

Not sure if these are what Jeff was referring to but as a workaround, you
can configure the following STCS compaction subproperties:
- min_threshold - set to 2 so that only a minimum of 2 similar-sized
sstables are required to trigger a minor compaction instead of the default 4
- tombstone_threshold - set to 0.1 so that if at least 10% of an sstable
are tombstones, Cassandra will compact the table alone instead of waiting
for the higher default ratio of 0.2
- unchecked_tombstone_compaction - set to true to allow Cassandra to run
tombstone compaction without having to check if an sstable is eligible for
compaction

WARNING - For future reference, this is just a workaround. It isn't a fix
for clusters with bad data models. Consider these as buying your cluster
some breathing space while you redesign your data model. Cheers!

On Thu, Aug 10, 2017 at 12:27 AM, Jeff Jirsa <jj...@gmail.com> wrote:

> The deleting compaction strategy from protectwise (https://github.com/
> protectwise/cassandra-util/blob/master/deleting-
> compaction-strategy/README.md) was written (I believe) to solve a similar
> problem - business based deletion rules to enable flexible TTLs. May want
> to glance at that.
>
> Other answers inline below
>
>
> --
> Jeff Jirsa
>
>
> On Aug 9, 2017, at 1:41 AM, Steinmaurer, Thomas <
> thomas.steinmaurer@dynatrace.com> wrote:
>
> Hello,
>
>
>
> our top contributor from a data volume perspective is time series data. We
> are running with STCS since our initial production deployment in 2014 with
> several clusters with a varying number of nodes, but currently with max. 9
> nodes per single cluster per different region in AWS with m4.xlarge / EBS
> gp2 storage. We have a road of Cassandra versions starting with 1.2 to
> actually using DSC 2.1.15 soon to be replaced by Apache Cassandra 2.1.18
> across all deployments. Lately we switched from Thrift (Astyanax) to
> Native/CQL (DataStax driver). Overall we are pretty happy with stability
> and the scale out offering.
>
>
>
> We store time series data in different resolutions, from 1min up to 1day
> aggregates per “time slot”. Each resolution has its own column family /
> table and a periodic worker is executing our business logic regarding time
> series aging from e.g. 1min => 5min => … resolution + deletion in source
> resolutions according to our retention per resolution policy. So deletions
> will happen way later (e.g. at least > 14d). We don’t use TTLs on written
> time series data (in production, see TWCS testing below), so purging is
> exclusively handled by explicit DELETEs in our aging business logic
> creating tombstones.
>
>
>
> Naturally with STCS and late explicit deletions / tombstones, it will take
> a lot of time to finally reclaim disk space, even worse, we are now running
> a major compaction every X weeks. We currently are also testing with STCS
> min_threshold = 2 etc., but all in all, this all feels not being a
> long-term solution. I guess there is nothing else we are missing from a
> configuration/setting side with STCS? Single SSTable compaction might not
> kick in as well, cause checking with sstablemeta, estimated droppable
> tombstones for our time series based SSTables is pretty much 0.0 all the
> time. I guess as we don’t write with TTL?
>
>
>
> Or you aren't issuing deletes, explicit deletes past GCGS will cause that
> number to increase
>
>
>
> TWCS caught my eyes in 2015 I think, and even more at the Cassandra Summit
> 2016 + other Tombstone related talks. Cassandra 3.0 is around 6 months
> ahead for us, thus initial testing was with 2.1.18 patched with TWCS from
> GitHub.
>
>
>
> Looks like TWCS is exactly what we need, thus test-wise, once we start
> writing with TTL we end up with a single SSTable per passed window size and
> data (SSTables) older than TTL + grace get automatically removed from disk.
> Even with enabled out-of-orders DELETEs from our business logic, purging
> SSTables seems not be stucked. Not sure if this is expected. Writing with
> TTL is also a bit problematic, in case our retention policy changes in
> general or for particular customers.
>
>
> Search for my Cassandra summit talk from 2016 - there's a few other
> compaction options you probably want to set to more aggressively trigger
> single sstable compaction to help unstick it.
>
>
>
> A few questions, as we need some short-term (with C* 2.1) and long-term
> (with C* 3.0) mitigation:
>
> ·         With STCS, estimated droppable tombstones being always 0.0
> (thus also no automatic single SSTable compaction may happen): Is this a
> matter of not writing with TTL? If yes, would enabling TTL with STCS
> improve the disk reclaim situation, cause then single SSTAble compactions
> will kick in?
>
> ·         What is the semantic of “default_time_to_live” at table level?
> From: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html
> : “After the default_time_to_live TTL value has been exceed, Cassandra
> tombstones the entire table”. What does “entire table” mean?
>
>
> It probably means sstable, but even that isn't really accurate - that's a
> doc bug
>
> Hopefully / I guess I don’t end up with an empty table every X past TTLs?
>
> ·         Anything else I’m missing regarding STCS and reclaiming disk
> space earlier in our TS use case?
>
>
> LCS rewrites much more aggressively on partition updates - if you can
> spare the IO it's likely going to be more efficient purging deleted data
> than STCS
>
> ·         I know, changing compaction is a matter of executing ALTER
> TABLE (or temporary via JMX for a single node), but as we have legacy data
> being written without TTL, I wonder if we may end up in stuck SSTable again
>
> ·         In case of stuck SSTables with any compaction strategy, what is
> the best way to debug/analyze why it got stuck (overlapping etc.)?
>
>
> sstableexpiredblockers
>
>
>
> Thanks a lot and sorry for the lengthy email.
>
>
>
> Thomas
> The contents of this e-mail are intended for the named addressee only. It
> contains information that may be confidential. Unless you are the named
> addressee or an authorized designee, you may not copy or use it, or
> disclose it to anyone else. If you received it in error please notify us
> immediately and then destroy it. Dynatrace Austria GmbH (registration
> number FN 91482h) is a company registered in Linz whose registered office
> is at 4040 Linz, Austria, Freistädterstraße 313
>
>

Re: Questions on time series use case, tombstones, TWCS

Posted by Jeff Jirsa <jj...@gmail.com>.

The deleting compaction strategy from protectwise (https://github.com/protectwise/cassandra-util/blob/master/deleting-compaction-strategy/README.md) was written (I believe) to solve a similar problem - business based deletion rules to enable flexible TTLs. May want to glance at that.

Other answers inline below 


-- 
Jeff Jirsa


> On Aug 9, 2017, at 1:41 AM, Steinmaurer, Thomas <th...@dynatrace.com> wrote:
> 
> Hello,
>  
> our top contributor from a data volume perspective is time series data. We are running with STCS since our initial production deployment in 2014 with several clusters with a varying number of nodes, but currently with max. 9 nodes per single cluster per different region in AWS with m4.xlarge / EBS gp2 storage. We have a road of Cassandra versions starting with 1.2 to actually using DSC 2.1.15 soon to be replaced by Apache Cassandra 2.1.18 across all deployments. Lately we switched from Thrift (Astyanax) to Native/CQL (DataStax driver). Overall we are pretty happy with stability and the scale out offering.
>  
> We store time series data in different resolutions, from 1min up to 1day aggregates per “time slot”. Each resolution has its own column family / table and a periodic worker is executing our business logic regarding time series aging from e.g. 1min => 5min => … resolution + deletion in source resolutions according to our retention per resolution policy. So deletions will happen way later (e.g. at least > 14d). We don’t use TTLs on written time series data (in production, see TWCS testing below), so purging is exclusively handled by explicit DELETEs in our aging business logic creating tombstones.
>  
> Naturally with STCS and late explicit deletions / tombstones, it will take a lot of time to finally reclaim disk space, even worse, we are now running a major compaction every X weeks. We currently are also testing with STCS min_threshold = 2 etc., but all in all, this all feels not being a long-term solution. I guess there is nothing else we are missing from a configuration/setting side with STCS? Single SSTable compaction might not kick in as well, cause checking with sstablemeta, estimated droppable tombstones for our time series based SSTables is pretty much 0.0 all the time. I guess as we don’t write with TTL?


Or you aren't issuing deletes, explicit deletes past GCGS will cause that number to increase

>  
> TWCS caught my eyes in 2015 I think, and even more at the Cassandra Summit 2016 + other Tombstone related talks. Cassandra 3.0 is around 6 months ahead for us, thus initial testing was with 2.1.18 patched with TWCS from GitHub.
>  
> Looks like TWCS is exactly what we need, thus test-wise, once we start writing with TTL we end up with a single SSTable per passed window size and data (SSTables) older than TTL + grace get automatically removed from disk. Even with enabled out-of-orders DELETEs from our business logic, purging SSTables seems not be stucked. Not sure if this is expected. Writing with TTL is also a bit problematic, in case our retention policy changes in general or for particular customers.

Search for my Cassandra summit talk from 2016 - there's a few other compaction options you probably want to set to more aggressively trigger single sstable compaction to help unstick it.

>  
> A few questions, as we need some short-term (with C* 2.1) and long-term (with C* 3.0) mitigation:
> ·         With STCS, estimated droppable tombstones being always 0.0 (thus also no automatic single SSTable compaction may happen): Is this a matter of not writing with TTL? If yes, would enabling TTL with STCS improve the disk reclaim situation, cause then single SSTAble compactions will kick in?
> ·         What is the semantic of “default_time_to_live” at table level? From: http://docs.datastax.com/en/cql/3.1/cql/cql_using/use_expire_c.html : “After the default_time_to_live TTL value has been exceed, Cassandra tombstones the entire table”. What does “entire table” mean?

It probably means sstable, but even that isn't really accurate - that's a doc bug 

> Hopefully / I guess I don’t end up with an empty table every X past TTLs?
> ·         Anything else I’m missing regarding STCS and reclaiming disk space earlier in our TS use case?

LCS rewrites much more aggressively on partition updates - if you can spare the IO it's likely going to be more efficient purging deleted data than STCS 

> ·         I know, changing compaction is a matter of executing ALTER TABLE (or temporary via JMX for a single node), but as we have legacy data being written without TTL, I wonder if we may end up in stuck SSTable again
> ·         In case of stuck SSTables with any compaction strategy, what is the best way to debug/analyze why it got stuck (overlapping etc.)?

sstableexpiredblockers

>  
> Thanks a lot and sorry for the lengthy email.
>  
> Thomas
> The contents of this e-mail are intended for the named addressee only. It contains information that may be confidential. Unless you are the named addressee or an authorized designee, you may not copy or use it, or disclose it to anyone else. If you received it in error please notify us immediately and then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a company registered in Linz whose registered office is at 4040 Linz, Austria, Freistädterstraße 313