You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Kevin O'Connor <ke...@reddit.com> on 2016/09/01 07:53:57 UTC

STCS Compaction with wide rows & TTL'd data

We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken and
one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
key, and the columns are some data about the OAuth token. There's a TTL set
on it, usually 3600, but can be higher (up to 1 month).
OAuth2AccessTokensByUser has the user as the row key, and then all of the
user's token identifiers as column values. Each of the column values has a
TTL that is set to the same as the access token it corresponds to.

The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
I can see the droppable tombstones ratio is around 90% for the larger
sstables.

My question is - why aren't these tombstones getting compacted away? I'm
guessing that it's because we use STCS and the large sstables that have
built up over time are never considered for compaction. Would LCS be a
better fit for the issue of trying to keep the tombstones in check?

I've also tried forceUserDefinedCompaction via JMX on some of the largest
sstables and it just creates a new sstable of the exact same size, which
was pretty surprising. Why would this explicit request to compact an
sstable not remove tombstones?

Thanks!

Kevin

Re: STCS Compaction with wide rows & TTL'd data

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

Also, if you can get to at least 2.0 you can use
TimeWindowCompactionStrategy which works a lot better with time series data
w/ TTLs than STCS.

On Fri, Sep 2, 2016 at 9:53 AM Jonathan Haddad <jo...@jonhaddad.com> wrote:

> What's your gc_grace_seconds set to?  Is it possible you have a lot of
> tombstones that haven't reached the GC grace time yet?
>
>
> On Thu, Sep 1, 2016 at 12:54 AM Kevin O'Connor <ke...@reddit.com> wrote:
>
>> We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken
>> and one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the
>> row key, and the columns are some data about the OAuth token. There's a TTL
>> set on it, usually 3600, but can be higher (up to 1 month).
>> OAuth2AccessTokensByUser has the user as the row key, and then all of the
>> user's token identifiers as column values. Each of the column values has a
>> TTL that is set to the same as the access token it corresponds to.
>>
>> The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
>> OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
>> I can see the droppable tombstones ratio is around 90% for the larger
>> sstables.
>>
>> My question is - why aren't these tombstones getting compacted away? I'm
>> guessing that it's because we use STCS and the large sstables that have
>> built up over time are never considered for compaction. Would LCS be a
>> better fit for the issue of trying to keep the tombstones in check?
>>
>> I've also tried forceUserDefinedCompaction via JMX on some of the largest
>> sstables and it just creates a new sstable of the exact same size, which
>> was pretty surprising. Why would this explicit request to compact an
>> sstable not remove tombstones?
>>
>> Thanks!
>>
>> Kevin
>>
>

Re: STCS Compaction with wide rows & TTL'd data

Posted by Jonathan Haddad <jo...@jonhaddad.com>.

What's your gc_grace_seconds set to?  Is it possible you have a lot of
tombstones that haven't reached the GC grace time yet?

On Thu, Sep 1, 2016 at 12:54 AM Kevin O'Connor <ke...@reddit.com> wrote:

> We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken and
> one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
> key, and the columns are some data about the OAuth token. There's a TTL set
> on it, usually 3600, but can be higher (up to 1 month).
> OAuth2AccessTokensByUser has the user as the row key, and then all of the
> user's token identifiers as column values. Each of the column values has a
> TTL that is set to the same as the access token it corresponds to.
>
> The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
> OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
> I can see the droppable tombstones ratio is around 90% for the larger
> sstables.
>
> My question is - why aren't these tombstones getting compacted away? I'm
> guessing that it's because we use STCS and the large sstables that have
> built up over time are never considered for compaction. Would LCS be a
> better fit for the issue of trying to keep the tombstones in check?
>
> I've also tried forceUserDefinedCompaction via JMX on some of the largest
> sstables and it just creates a new sstable of the exact same size, which
> was pretty surprising. Why would this explicit request to compact an
> sstable not remove tombstones?
>
> Thanks!
>
> Kevin
>

Re: STCS Compaction with wide rows & TTL'd data

Posted by Kevin O'Connor <ke...@reddit.com>.

On Fri, Sep 2, 2016 at 9:33 AM, Mark Rose <ma...@markrose.ca> wrote:

> Hi Kevin,
>
> The tombstones will live in an sstable until it gets compacted. Do you
> have a lot of pending compactions? If so, increasing the number of
> parallel compactors may help.


Nope, we are pretty well managed on compactions. Only ever 1 or 2 running
at a time per node.


> You may also be able to tun the STCS
> parameters. Here's a good explanation of how it works:
> https://shrikantbang.wordpress.com/2014/04/22/size-
> tiered-compaction-strategy-in-apache-cassandra/


Yeah interesting - I'd like to try that. Is there a way to verify what the
settings are before changing them? DESCRIBE TABLE doesn't seem to show the
compaction subproperties.


> Anyway, LCS would probably be a better fit for your use case. LCS
> would help with eliminating tombstones, but it may also result in
> dramatically higher CPU usage for compaction. If LCS compaction can
> keep up, in addition to getting ride of tombstones faster, LCS should
> reduce the number of sstables that must be read to return the row and
> have a positive impact on read latency. STCS is a bad fit for rows
> that are updated frequently (which includes rows with TTL'ed data).
>

Thanks - that may end up being where we go with this.

Also, you may have an error in your application design. OAuth Access
> Tokens are designed to have a very short lifetime of seconds or
> minutes. On access token expiry, a Refresh Token should be used to get
> a new access token. A long-lived access token is a dangerous thing as
> there is no way to disable it (refresh tokens should be disabled to
> prevent the creation of new access tokens).
>

Yeah, noted. We only allow longer lived access tokens in some very specific
scenarios, so they are much less likely to be in that CF than the standard
3600s ones, but they're there.


>
> -Mark
>
> On Thu, Sep 1, 2016 at 3:53 AM, Kevin O'Connor <ke...@reddit.com> wrote:
> > We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken
> and
> > one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
> > key, and the columns are some data about the OAuth token. There's a TTL
> set
> > on it, usually 3600, but can be higher (up to 1 month).
> > OAuth2AccessTokensByUser has the user as the row key, and then all of the
> > user's token identifiers as column values. Each of the column values has
> a
> > TTL that is set to the same as the access token it corresponds to.
> >
> > The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
> > OAuth2AccessTokensByUser CF takes around ~110 GB. If I use
> sstablemetadata,
> > I can see the droppable tombstones ratio is around 90% for the larger
> > sstables.
> >
> > My question is - why aren't these tombstones getting compacted away? I'm
> > guessing that it's because we use STCS and the large sstables that have
> > built up over time are never considered for compaction. Would LCS be a
> > better fit for the issue of trying to keep the tombstones in check?
> >
> > I've also tried forceUserDefinedCompaction via JMX on some of the largest
> > sstables and it just creates a new sstable of the exact same size, which
> was
> > pretty surprising. Why would this explicit request to compact an sstable
> not
> > remove tombstones?
> >
> > Thanks!
> >
> > Kevin
>

Re: STCS Compaction with wide rows & TTL'd data

Posted by Mark Rose <ma...@markrose.ca>.

Hi Kevin,

The tombstones will live in an sstable until it gets compacted. Do you
have a lot of pending compactions? If so, increasing the number of
parallel compactors may help. You may also be able to tun the STCS
parameters. Here's a good explanation of how it works:
https://shrikantbang.wordpress.com/2014/04/22/size-tiered-compaction-strategy-in-apache-cassandra/

Anyway, LCS would probably be a better fit for your use case. LCS
would help with eliminating tombstones, but it may also result in
dramatically higher CPU usage for compaction. If LCS compaction can
keep up, in addition to getting ride of tombstones faster, LCS should
reduce the number of sstables that must be read to return the row and
have a positive impact on read latency. STCS is a bad fit for rows
that are updated frequently (which includes rows with TTL'ed data).

Also, you may have an error in your application design. OAuth Access
Tokens are designed to have a very short lifetime of seconds or
minutes. On access token expiry, a Refresh Token should be used to get
a new access token. A long-lived access token is a dangerous thing as
there is no way to disable it (refresh tokens should be disabled to
prevent the creation of new access tokens).

-Mark

On Thu, Sep 1, 2016 at 3:53 AM, Kevin O'Connor <ke...@reddit.com> wrote:
> We're running C* 1.2.11 and have two CFs, one called OAuth2AccessToken and
> one OAuth2AccessTokensByUser. OAuth2AccessToken has the token as the row
> key, and the columns are some data about the OAuth token. There's a TTL set
> on it, usually 3600, but can be higher (up to 1 month).
> OAuth2AccessTokensByUser has the user as the row key, and then all of the
> user's token identifiers as column values. Each of the column values has a
> TTL that is set to the same as the access token it corresponds to.
>
> The OAuth2AccessToken CF takes up around ~6 GB on disk, whereas the
> OAuth2AccessTokensByUser CF takes around ~110 GB. If I use sstablemetadata,
> I can see the droppable tombstones ratio is around 90% for the larger
> sstables.
>
> My question is - why aren't these tombstones getting compacted away? I'm
> guessing that it's because we use STCS and the large sstables that have
> built up over time are never considered for compaction. Would LCS be a
> better fit for the issue of trying to keep the tombstones in check?
>
> I've also tried forceUserDefinedCompaction via JMX on some of the largest
> sstables and it just creates a new sstable of the exact same size, which was
> pretty surprising. Why would this explicit request to compact an sstable not
> remove tombstones?
>
> Thanks!
>
> Kevin