You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "taka-t@fujitsu.com" <ta...@fujitsu.com> on 2017/12/11 01:08:38 UTC

Tombstoned data seems to remain after compaction

Hi All,


I'm using the SSTable with Size Tired Compaction Strategy with
10 days gc grace period as default.

And sstablemetadata command shows Estimated tombstone drop times
As follows after minor compaction on 9th Dec, 2018.

(excerpt)
Estimated tombstone drop times:%n
1510934467:      2475 * 2017.11.18
1510965112:       135
1510983500:       225
1511003962:       105
1511021113:      2280
1511037818:        30
1511055563:       120
1511075445:       165


I just think there are records that should be deleted on
18th Nov, 2018 in the SSTable by the output above. My understanding
is correct?

If my understanding I correct, could someone tell me why those
expired data remains after compation?




Regards,
Takashima

----------------------------------------------------------------------
Toshiaki Takashima
Toyama Fujitsu Limited
+810764553131, ext. 7260292355 

----------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
For additional commands, e-mail: user-help@cassandra.apache.org

RE: Tombstoned data seems to remain after compaction

Posted by "taka-t@fujitsu.com" <ta...@fujitsu.com>.

Hi Kurt,


Thank you very much for your reply.
Well, I’ll try it after do it on test environment just in case ☺




Regards,
Takashima

From: kurt greaves [mailto:kurt@instaclustr.com]
Sent: Wednesday, December 13, 2017 8:48 AM
To: User <us...@cassandra.apache.org>
Subject: Re: Tombstoned data seems to remain after compaction

As long as you've limited the throughput of compactions you should be fine (by default it's 16mbps, this can be changed through nodetool setcompactionthroughput or in the yaml) - it will be no different to any other compaction occuring, the compaction will just take longer. You should be aware however that a major compaction will use up to double the disk space currently utilised by that table. Considering you've got lots of tombstones it will probably be a lot less than double, but it will still be significant, so ensure you have enough free space for the compaction to complete.

On 12 December 2017 at 07:44, taka-t@fujitsu.com<ma...@fujitsu.com> <ta...@fujitsu.com>> wrote:
Hi Jeff, Kurt


Thanks again for your advice.

Within those valuable ideas you provide, I think of executing nodetool compact
because it is the most simplest way to try and I’m really novice about Cassandra.

One thing I’m concerned about the plan is that the major compaction might
have a serious impact on our production system, that use Cassandra as storage for
data cache for web session or something like that.

We use the Cassandra ring with three node. And Replicates to all 3 nodes, using
QUORUM consistency level on data update.

Under such condition above, Are there any risks if I execute Major compaction
to each nodes one by one? The whole system’s throughput seriously get worse
for example?

I know I’m asking difficult question because those impact should differ depending
their each situation, but advices on common belief of you are highly appreciated!




Regards,
Takashima


From: Jeff Jirsa [mailto:jjirsa@gmail.com<ma...@gmail.com>]
Sent: Tuesday, December 12, 2017 2:35 AM
To: cassandra <us...@cassandra.apache.org>>
Subject: Re: Tombstoned data seems to remain after compaction

Hello Takashima,

Answers inline.

On Sun, Dec 10, 2017 at 11:41 PM, taka-t@fujitsu.com<ma...@fujitsu.com> <ta...@fujitsu.com>> wrote:
Hi Jeff


I’m appreciate for your detailed explanation :)



>  Expired data gets purged on compaction as long as it doesn’t overlap with other live data. The overlap thing can be difficult to reason about, but it’s meant to ensure correctness in the event that you write a value with ttl 180, then another value with ttl 1, and you don’t want to remove the value with ttl1 until you’ve also removed the value with ttl180, since it would lead to data being resurrected

I understand that TTL setting sometimes does not work as we expect, especially when we alter the
value afterword because of the Cassandra’s data consistency functionalities. My understanding is
correct?


If "does not work as you expect" you mean "data is not cleared immediately upon expiration", that is correct.


And I think of trying sstablesplit utility to let the Cassandra do minor compaction because one of
SSTables, which is oldest and very large so I want to compact it.

That is offline and requires downtime, which is usually not something you want to do if you can avoid it.

Instead, I recommend you consider the tombstone compaction subproperties to compaction, which let you force single sstable comapctions based on tombstone percentage (and set that low enough that it reclaims the space you want to reclaim).

Perhaps counterintuitively, compaction is most effective at freeing up space when it makes one very big file, compared to lots of little files - sstablesplit is probably not a good idea. A major compaction may help, if you have the extra IO and disk space.

Again, though, you should probably consider using something other than STCS going forward.

Re: Tombstoned data seems to remain after compaction

Posted by kurt greaves <ku...@instaclustr.com>.

As long as you've limited the throughput of compactions you should be fine
(by default it's 16mbps, this can be changed through nodetool
setcompactionthroughput or in the yaml) - it will be no different to any
other compaction occuring, the compaction will just take longer. You should
be aware however that a major compaction will use up to double the disk
space currently utilised by that table. Considering you've got lots of
tombstones it will probably be a lot less than double, but it will still be
significant, so ensure you have enough free space for the compaction to
complete.

On 12 December 2017 at 07:44, taka-t@fujitsu.com <ta...@fujitsu.com> wrote:

> Hi Jeff, Kurt
>
>
>
>
>
> Thanks again for your advice.
>
>
>
> Within those valuable ideas you provide, I think of executing nodetool
> compact
>
> because it is the most simplest way to try and I’m really novice about
> Cassandra.
>
>
>
> One thing I’m concerned about the plan is that the major compaction might
>
> have a serious impact on our production system, that use Cassandra as
> storage for
>
> data cache for web session or something like that.
>
>
>
> We use the Cassandra ring with three node. And Replicates to all 3 nodes,
> using
>
> QUORUM consistency level on data update.
>
>
>
> Under such condition above, Are there any risks if I execute Major
> compaction
>
> to each nodes one by one? The whole system’s throughput seriously get worse
>
> for example?
>
>
>
> I know I’m asking difficult question because those impact should differ
> depending
>
> their each situation, but advices on common belief of you are highly
> appreciated!
>
>
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com]
> *Sent:* Tuesday, December 12, 2017 2:35 AM
> *To:* cassandra <us...@cassandra.apache.org>
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> Hello Takashima,
>
>
>
> Answers inline.
>
>
>
> On Sun, Dec 10, 2017 at 11:41 PM, taka-t@fujitsu.com <ta...@fujitsu.com>
> wrote:
>
> Hi Jeff
>
>
>
>
>
> I’m appreciate for your detailed explanation :)
>
>
>
>
>
> Ø  Expired data gets purged on compaction as long as it doesn’t overlap
> with other live data. The overlap thing can be difficult to reason about,
> but it’s meant to ensure correctness in the event that you write a value
> with ttl 180, then another value with ttl 1, and you don’t want to remove
> the value with ttl1 until you’ve also removed the value with ttl180,
> since it would lead to data being resurrected
>
>
>
> I understand that TTL setting sometimes does not work as we expect,
> especially when we alter the
>
> value afterword because of the Cassandra’s data consistency
> functionalities. My understanding is
>
> correct?
>
>
>
>
>
> If "does not work as you expect" you mean "data is not cleared immediately
> upon expiration", that is correct.
>
>
>
>
>
> And I think of trying sstablesplit utility to let the Cassandra do minor
> compaction because one of
>
> SSTables, which is oldest and very large so I want to compact it.
>
>
>
> That is offline and requires downtime, which is usually not something you
> want to do if you can avoid it.
>
>
>
> Instead, I recommend you consider the tombstone compaction subproperties
> to compaction, which let you force single sstable comapctions based on
> tombstone percentage (and set that low enough that it reclaims the space
> you want to reclaim).
>
>
>
> Perhaps counterintuitively, compaction is most effective at freeing up
> space when it makes one very big file, compared to lots of little files -
> sstablesplit is probably not a good idea. A major compaction may help, if
> you have the extra IO and disk space.
>
>
>
> Again, though, you should probably consider using something other than
> STCS going forward.
>
>
>

RE: Tombstoned data seems to remain after compaction

Posted by "taka-t@fujitsu.com" <ta...@fujitsu.com>.

Hi Jeff, Kurt


Thanks again for your advice.

Within those valuable ideas you provide, I think of executing nodetool compact
because it is the most simplest way to try and I’m really novice about Cassandra.

One thing I’m concerned about the plan is that the major compaction might
have a serious impact on our production system, that use Cassandra as storage for
data cache for web session or something like that.

We use the Cassandra ring with three node. And Replicates to all 3 nodes, using
QUORUM consistency level on data update.

Under such condition above, Are there any risks if I execute Major compaction
to each nodes one by one? The whole system’s throughput seriously get worse
for example?

I know I’m asking difficult question because those impact should differ depending
their each situation, but advices on common belief of you are highly appreciated!




Regards,
Takashima


From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Tuesday, December 12, 2017 2:35 AM
To: cassandra <us...@cassandra.apache.org>
Subject: Re: Tombstoned data seems to remain after compaction

Hello Takashima,

Answers inline.

On Sun, Dec 10, 2017 at 11:41 PM, taka-t@fujitsu.com<ma...@fujitsu.com> <ta...@fujitsu.com>> wrote:
Hi Jeff


I’m appreciate for your detailed explanation :)



>  Expired data gets purged on compaction as long as it doesn’t overlap with other live data. The overlap thing can be difficult to reason about, but it’s meant to ensure correctness in the event that you write a value with ttl 180, then another value with ttl 1, and you don’t want to remove the value with ttl1 until you’ve also removed the value with ttl180, since it would lead to data being resurrected

I understand that TTL setting sometimes does not work as we expect, especially when we alter the
value afterword because of the Cassandra’s data consistency functionalities. My understanding is
correct?


If "does not work as you expect" you mean "data is not cleared immediately upon expiration", that is correct.


And I think of trying sstablesplit utility to let the Cassandra do minor compaction because one of
SSTables, which is oldest and very large so I want to compact it.

That is offline and requires downtime, which is usually not something you want to do if you can avoid it.

Instead, I recommend you consider the tombstone compaction subproperties to compaction, which let you force single sstable comapctions based on tombstone percentage (and set that low enough that it reclaims the space you want to reclaim).

Perhaps counterintuitively, compaction is most effective at freeing up space when it makes one very big file, compared to lots of little files - sstablesplit is probably not a good idea. A major compaction may help, if you have the extra IO and disk space.

Again, though, you should probably consider using something other than STCS going forward.

Re: Tombstoned data seems to remain after compaction

Posted by Jeff Jirsa <jj...@gmail.com>.

Hello Takashima,

Answers inline.

On Sun, Dec 10, 2017 at 11:41 PM, taka-t@fujitsu.com <ta...@fujitsu.com>
wrote:

> Hi Jeff
>
>
>
>
>
> I’m appreciate for your detailed explanation :)
>
>
>
>
>
> Ø  Expired data gets purged on compaction as long as it doesn’t overlap
> with other live data. The overlap thing can be difficult to reason about,
> but it’s meant to ensure correctness in the event that you write a value
> with ttl 180, then another value with ttl 1, and you don’t want to remove
> the value with ttl1 until you’ve also removed the value with ttl180, since
> it would lead to data being resurrected
>
>
>
> I understand that TTL setting sometimes does not work as we expect,
> especially when we alter the
>
> value afterword because of the Cassandra’s data consistency
> functionalities. My understanding is
>
> correct?
>
>
>

If "does not work as you expect" you mean "data is not cleared immediately
upon expiration", that is correct.

>
>
> And I think of trying sstablesplit utility to let the Cassandra do minor
> compaction because one of
>
> SSTables, which is oldest and very large so I want to compact it.
>

That is offline and requires downtime, which is usually not something you
want to do if you can avoid it.

Instead, I recommend you consider the tombstone compaction subproperties to
compaction, which let you force single sstable comapctions based on
tombstone percentage (and set that low enough that it reclaims the space
you want to reclaim).

Perhaps counterintuitively, compaction is most effective at freeing up
space when it makes one very big file, compared to lots of little files -
sstablesplit is probably not a good idea. A major compaction may help, if
you have the extra IO and disk space.

Again, though, you should probably consider using something other than STCS
going forward.

Re: Tombstoned data seems to remain after compaction

Posted by kurt greaves <ku...@instaclustr.com>.

It might... If you have the disk space a major compaction would be better,
or user defined compactions with the large/old SSTable. Better yet if
you're on a recent version you can do a splitting major compaction (all
these options are available through *nodetool compact*).


On 11 December 2017 at 07:41, taka-t@fujitsu.com <ta...@fujitsu.com> wrote:

> Hi Jeff
>
>
>
>
>
> I’m appreciate for your detailed explanation :)
>
>
>
>
>
> Ø  Expired data gets purged on compaction as long as it doesn’t overlap
> with other live data. The overlap thing can be difficult to reason about,
> but it’s meant to ensure correctness in the event that you write a value
> with ttl 180, then another value with ttl 1, and you don’t want to remove
> the value with ttl1 until you’ve also removed the value with ttl180, since
> it would lead to data being resurrected
>
>
>
> I understand that TTL setting sometimes does not work as we expect,
> especially when we alter the
>
> value afterword because of the Cassandra’s data consistency
> functionalities. My understanding is
>
> correct?
>
>
>
>
>
> And I think of trying sstablesplit utility to let the Cassandra do minor
> compaction because one of
>
> SSTables, which is oldest and very large so I want to compact it.
>
>
>
> Do you  think my plan works as expected?
>
>
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com]
> *Sent:* Monday, December 11, 2017 3:36 PM
>
> *To:* user@cassandra.apache.org
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> Replies inline
>
>
>
>
> On Dec 10, 2017, at 9:59 PM, "taka-t@fujitsu.com" <ta...@fujitsu.com>
> wrote:
>
> Hi Jeff,
>
>
>
>
>
> Ø  Are all of your writes TTL’d in this table?
>
> Yes. We set TTL to 180 days at first, and then altered it to just 1 day
> because we noticed the First TTL
>
> setting is too long.
>
>
>
>
>
> Ok this is different - Kurt’s answer is true when you issue explicit
> deletes. Expiring data is slightly different.
>
>
>
> Expired data gets purged on compaction as long as it doesn’t overlap with
> other live data. The overlap thing can be difficult to reason about, but
> it’s meant to ensure correctness in the event that you write a value with
> ttl 180, then another value with ttl 1, and you don’t want to remove the
> value with ttl1 until you’ve also removed the value with ttl180, since it
> would lead to data being resurrected
>
>
>
> This is the primary reason that ttl’d data doesn’t get cleaned up when
> people expect
>
>
>
>
>
>
>
>
>
> Ø  Which compaction strategy are you using?
>
> We use Size Tiered Compaction Strategy.
>
>
>
>
>
>
>
> LCS would compact more aggressively and try to minimize overlaps
>
>
>
> TWCS is designed for expiring data and tries to group data by time window
> for more efficient expiration.
>
>
>
> You would likely benefit from changing to either of those - but you’ll
> want to try it on a single node first to confirm (should be able to find
> videos online about using JMX to change the compaction strategy of a single
> node)
>
>
>
> Ø  Are you asking these questions because you’re running out of space
> faster than you expect and you’d like to expire data faster?
>
> You’re right. We want to know the reason and how to purge those old data
> soon if possible.
>
> And I want to understand why those old records reported by the
> sstablemetadata command persist in sstable data file *in advance*.
>
> https://m.youtube.com/watch?v=PWtekUWCIaw
>
>
>
>
>
> Not to self promote too much, but I’ve given a few talks on running time
> series Cassandra clusters. These slides https://www.slideshare.
> net/mobile/JeffJirsa1/using-time-window-compaction-
> strategy-for-time-series-workloads (in video form here,
> https://m.youtube.com/watch?v=PWtekUWCIaw ) may be useful.
>
>
>
>
>
> B.T.W
>
> I’m sorry but please let me ask the question again.
>
> Here is the excerpt of sstablemetadata command below.
>
>
>
> Does the section “*Estimated tombstone drop times*” mean that the sstable
> contains tombstones for those records that should expire
>
> on the date of the 1st column? And the data might exist in other SSTables?
>
>
>
> (excerpt)
>
> ----
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> ----
>
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
> *From:* Jeff Jirsa [mailto:jjirsa@gmail.com <jj...@gmail.com>]
> *Sent:* Monday, December 11, 2017 2:35 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> Mutations read during boot won’t go into the memtable unless the mutation
> is in the commitlog (which usually means fairly recent - they’re a fixed
> size)
>
> Are all of your writes TTL’d in this table?
>
> Which compaction strategy are you using?
>
> Are you asking these questions because you’re running out of space faster
> than you expect and you’d like to expire data faster?
>
>
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Dec 10, 2017, at 9:30 PM, "taka-t@fujitsu.com" <ta...@fujitsu.com>
> wrote:
>
> Hi Kurt,
>
>
>
>
>
> Thanks for your reply!
>
>
>
> “””
>
> The tombstone needs to compact with every SSTable that contains data for
> the corresponding tombstone.
>
> “””
>
>
>
> Let me explain my understanding by example:
>
>
>
> 1.     A record inserted with 180 days TTL (Very long).
>
> 2.     The record is saved to SSTable (A) when the server restarts or
> some events like that.
>
> 3.     After 180 days pass, The Cassandra process read SSTable (A) on its
> boot process ( or, read access?) and put tombstone for the record on *
> *Memory**.
>
> 4.     The tombstone on **Memory** is saved to SSTable (B) the next time
> the server is rebooted.
>
>
>
> The procedure above splits the sstable for both the record per se and
> tombstone.
>
>
>
> My understanding is correct?
>
>
>
>
>
>
>
> Regards,
>
> Takashima
>
>
>
>
>
> *From:* kurt greaves [mailto:kurt@instaclustr.com <ku...@instaclustr.com>]
> *Sent:* Monday, December 11, 2017 1:46 PM
> *To:* User <us...@cassandra.apache.org>
> *Subject:* Re: Tombstoned data seems to remain after compaction
>
>
>
> The tombstone needs to compact with every SSTable that contains data for
> the corresponding tombstone. For example the tombstone may be in that
> SSTable but some data the tombstone covers may possibly be in another
> SSTable. Only once all SSTables that contain relevant data have been
> compacted with the SSTable containing the tombstone can the tombstone be
> removed.
>
>
>
> On 11 December 2017 at 01:08, taka-t@fujitsu.com <ta...@fujitsu.com>
> wrote:
>
> Hi All,
>
>
> I'm using the SSTable with Size Tired Compaction Strategy with
> 10 days gc grace period as default.
>
> And sstablemetadata command shows Estimated tombstone drop times
> As follows after minor compaction on 9th Dec, 2018.
>
> (excerpt)
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> 1511075445:       165
>
>
> I just think there are records that should be deleted on
> 18th Nov, 2018 in the SSTable by the output above. My understanding
> is correct?
>
> If my understanding I correct, could someone tell me why those
> expired data remains after compation?
>
>
>
>
> Regards,
> Takashima
>
> ----------------------------------------------------------------------
> Toshiaki Takashima
> Toyama Fujitsu Limited
> +810764553131, ext. 7260292 <+81%2076-455-3131>355
>
> ----------------------------------------------------------------------
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>
>
>

RE: Tombstoned data seems to remain after compaction

Posted by "taka-t@fujitsu.com" <ta...@fujitsu.com>.

Hi Jeff


I’m appreciate for your detailed explanation :)



Ø  Expired data gets purged on compaction as long as it doesn’t overlap with other live data. The overlap thing can be difficult to reason about, but it’s meant to ensure correctness in the event that you write a value with ttl 180, then another value with ttl 1, and you don’t want to remove the value with ttl1 until you’ve also removed the value with ttl180, since it would lead to data being resurrected

I understand that TTL setting sometimes does not work as we expect, especially when we alter the
value afterword because of the Cassandra’s data consistency functionalities. My understanding is
correct?


And I think of trying sstablesplit utility to let the Cassandra do minor compaction because one of
SSTables, which is oldest and very large so I want to compact it.

Do you  think my plan works as expected?




Regards,
Takashima

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Monday, December 11, 2017 3:36 PM
To: user@cassandra.apache.org
Subject: Re: Tombstoned data seems to remain after compaction

Replies inline


On Dec 10, 2017, at 9:59 PM, "taka-t@fujitsu.com<ma...@fujitsu.com>" <ta...@fujitsu.com>> wrote:
Hi Jeff,



Ø  Are all of your writes TTL’d in this table?
Yes. We set TTL to 180 days at first, and then altered it to just 1 day because we noticed the First TTL
setting is too long.


Ok this is different - Kurt’s answer is true when you issue explicit deletes. Expiring data is slightly different.

Expired data gets purged on compaction as long as it doesn’t overlap with other live data. The overlap thing can be difficult to reason about, but it’s meant to ensure correctness in the event that you write a value with ttl 180, then another value with ttl 1, and you don’t want to remove the value with ttl1 until you’ve also removed the value with ttl180, since it would lead to data being resurrected

This is the primary reason that ttl’d data doesn’t get cleaned up when people expect






Ø  Which compaction strategy are you using?
We use Size Tiered Compaction Strategy.



LCS would compact more aggressively and try to minimize overlaps

TWCS is designed for expiring data and tries to group data by time window for more efficient expiration.

You would likely benefit from changing to either of those - but you’ll want to try it on a single node first to confirm (should be able to find videos online about using JMX to change the compaction strategy of a single node)



Ø  Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster?
You’re right. We want to know the reason and how to purge those old data soon if possible.
And I want to understand why those old records reported by the sstablemetadata command persist in sstable data file in advance.
https://m.youtube.com/watch?v=PWtekUWCIaw


Not to self promote too much, but I’ve given a few talks on running time series Cassandra clusters. These slides https://www.slideshare.net/mobile/JeffJirsa1/using-time-window-compaction-strategy-for-time-series-workloads (in video form here, https://m.youtube.com/watch?v=PWtekUWCIaw ) may be useful.



B.T.W
I’m sorry but please let me ask the question again.
Here is the excerpt of sstablemetadata command below.

Does the section “Estimated tombstone drop times” mean that the sstable contains tombstones for those records that should expire
on the date of the 1st column? And the data might exist in other SSTables?

(excerpt)
----
Estimated tombstone drop times:%n
1510934467:      2475 * 2017.11.18
1510965112:       135
1510983500:       225
1511003962:       105
1511021113:      2280
1511037818:        30
1511055563:       120
----





Regards,
Takashima

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Monday, December 11, 2017 2:35 PM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Tombstoned data seems to remain after compaction

Mutations read during boot won’t go into the memtable unless the mutation is in the commitlog (which usually means fairly recent - they’re a fixed size)

Are all of your writes TTL’d in this table?
Which compaction strategy are you using?
Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster?


--
Jeff Jirsa


On Dec 10, 2017, at 9:30 PM, "taka-t@fujitsu.com<ma...@fujitsu.com>" <ta...@fujitsu.com>> wrote:
Hi Kurt,


Thanks for your reply!

“””
The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone.
“””

Let me explain my understanding by example:


1.     A record inserted with 180 days TTL (Very long).

2.     The record is saved to SSTable (A) when the server restarts or some events like that.

3.     After 180 days pass, The Cassandra process read SSTable (A) on its boot process ( or, read access?) and put tombstone for the record on *Memory*.

4.     The tombstone on *Memory* is saved to SSTable (B) the next time the server is rebooted.

The procedure above splits the sstable for both the record per se and tombstone.

My understanding is correct?



Regards,
Takashima


From: kurt greaves [mailto:kurt@instaclustr.com]
Sent: Monday, December 11, 2017 1:46 PM
To: User <us...@cassandra.apache.org>>
Subject: Re: Tombstoned data seems to remain after compaction

The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone. For example the tombstone may be in that SSTable but some data the tombstone covers may possibly be in another SSTable. Only once all SSTables that contain relevant data have been compacted with the SSTable containing the tombstone can the tombstone be removed.

On 11 December 2017 at 01:08, taka-t@fujitsu.com<ma...@fujitsu.com> <ta...@fujitsu.com>> wrote:
Hi All,


I'm using the SSTable with Size Tired Compaction Strategy with
10 days gc grace period as default.

And sstablemetadata command shows Estimated tombstone drop times
As follows after minor compaction on 9th Dec, 2018.

(excerpt)
Estimated tombstone drop times:%n
1510934467:      2475 * 2017.11.18
1510965112:       135
1510983500:       225
1511003962:       105
1511021113:      2280
1511037818:        30
1511055563:       120
1511075445:       165


I just think there are records that should be deleted on
18th Nov, 2018 in the SSTable by the output above. My understanding
is correct?

If my understanding I correct, could someone tell me why those
expired data remains after compation?




Regards,
Takashima

----------------------------------------------------------------------
Toshiaki Takashima
Toyama Fujitsu Limited
+810764553131, ext. 7260292355

----------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<ma...@cassandra.apache.org>

Re: Tombstoned data seems to remain after compaction

Posted by Jeff Jirsa <jj...@gmail.com>.

Replies inline 


> On Dec 10, 2017, at 9:59 PM, "taka-t@fujitsu.com" <ta...@fujitsu.com> wrote:
> 
> Hi Jeff,
>  
>  
> Ø  Are all of your writes TTL’d in this table?
> Yes. We set TTL to 180 days at first, and then altered it to just 1 day because we noticed the First TTL
> setting is too long.
>  

Ok this is different - Kurt’s answer is true when you issue explicit deletes. Expiring data is slightly different.

Expired data gets purged on compaction as long as it doesn’t overlap with other live data. The overlap thing can be difficult to reason about, but it’s meant to ensure correctness in the event that you write a value with ttl 180, then another value with ttl 1, and you don’t want to remove the value with ttl1 until you’ve also removed the value with ttl180, since it would lead to data being resurrected

This is the primary reason that ttl’d data doesn’t get cleaned up when people expect

 

>  
> Ø  Which compaction strategy are you using?
> We use Size Tiered Compaction Strategy.
>  
>  

LCS would compact more aggressively and try to minimize overlaps

TWCS is designed for expiring data and tries to group data by time window for more efficient expiration.

You would likely benefit from changing to either of those - but you’ll want to try it on a single node first to confirm (should be able to find videos online about using JMX to change the compaction strategy of a single node)

> Ø  Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster? 
> You’re right. We want to know the reason and how to purge those old data soon if possible.
> And I want to understand why those old records reported by the sstablemetadata command persist in sstable data file in advance.
> https://m.youtube.com/watch?v=PWtekUWCIaw 
>  

Not to self promote too much, but I’ve given a few talks on running time series Cassandra clusters. These slides https://www.slideshare.net/mobile/JeffJirsa1/using-time-window-compaction-strategy-for-time-series-workloads (in video form here, https://m.youtube.com/watch?v=PWtekUWCIaw ) may be useful. 


> B.T.W
> I’m sorry but please let me ask the question again.
> Here is the excerpt of sstablemetadata command below.
>  
> Does the section “Estimated tombstone drop times” mean that the sstable contains tombstones for those records that should expire
> on the date of the 1st column? And the data might exist in other SSTables?
>  
> (excerpt)
> ----
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> ----
> 
>  
>  
>  
> Regards,
> Takashima
>  
> From: Jeff Jirsa [mailto:jjirsa@gmail.com] 
> Sent: Monday, December 11, 2017 2:35 PM
> To: user@cassandra.apache.org
> Subject: Re: Tombstoned data seems to remain after compaction
>  
> Mutations read during boot won’t go into the memtable unless the mutation is in the commitlog (which usually means fairly recent - they’re a fixed size)
> 
> Are all of your writes TTL’d in this table?
> Which compaction strategy are you using?
> Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster? 
>  
>  
> -- 
> Jeff Jirsa
>  
> 
> On Dec 10, 2017, at 9:30 PM, "taka-t@fujitsu.com" <ta...@fujitsu.com> wrote:
> 
> Hi Kurt,
>  
>  
> Thanks for your reply!
>  
> “””
> The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone.
> “””
>  
> Let me explain my understanding by example:
>  
> 1.     A record inserted with 180 days TTL (Very long).
> 2.     The record is saved to SSTable (A) when the server restarts or some events like that.
> 3.     After 180 days pass, The Cassandra process read SSTable (A) on its boot process ( or, read access?) and put tombstone for the record on *Memory*.
> 4.     The tombstone on *Memory* is saved to SSTable (B) the next time the server is rebooted.
>  
> The procedure above splits the sstable for both the record per se and tombstone.
>  
> My understanding is correct?
>  
>  
>  
> Regards,
> Takashima
>  
>  
> From: kurt greaves [mailto:kurt@instaclustr.com] 
> Sent: Monday, December 11, 2017 1:46 PM
> To: User <us...@cassandra.apache.org>
> Subject: Re: Tombstoned data seems to remain after compaction
>  
> The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone. For example the tombstone may be in that SSTable but some data the tombstone covers may possibly be in another SSTable. Only once all SSTables that contain relevant data have been compacted with the SSTable containing the tombstone can the tombstone be removed.
>  
> On 11 December 2017 at 01:08, taka-t@fujitsu.com <ta...@fujitsu.com> wrote:
> Hi All,
> 
> 
> I'm using the SSTable with Size Tired Compaction Strategy with
> 10 days gc grace period as default.
> 
> And sstablemetadata command shows Estimated tombstone drop times
> As follows after minor compaction on 9th Dec, 2018.
> 
> (excerpt)
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> 1511075445:       165
> 
> 
> I just think there are records that should be deleted on
> 18th Nov, 2018 in the SSTable by the output above. My understanding
> is correct?
> 
> If my understanding I correct, could someone tell me why those
> expired data remains after compation?
> 
> 
> 
> 
> Regards,
> Takashima
> 
> ----------------------------------------------------------------------
> Toshiaki Takashima
> Toyama Fujitsu Limited
> +810764553131, ext. 7260292355
> 
> ----------------------------------------------------------------------
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 
>

RE: Tombstoned data seems to remain after compaction

Posted by "taka-t@fujitsu.com" <ta...@fujitsu.com>.

Hi Jeff,



Ø  Are all of your writes TTL’d in this table?
Yes. We set TTL to 180 days at first, and then altered it to just 1 day because we noticed the First TTL
setting is too long.



Ø  Which compaction strategy are you using?
We use Size Tiered Compaction Strategy.



Ø  Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster?
You’re right. We want to know the reason and how to purge those old data soon if possible.
And I want to understand why those old records reported by the sstablemetadata command persist in sstable data file in advance.


B.T.W
I’m sorry but please let me ask the question again.
Here is the excerpt of sstablemetadata command below.

Does the section “Estimated tombstone drop times” mean that the sstable contains tombstones for those records that should expire
on the date of the 1st column? And the data might exist in other SSTables?

(excerpt)
----
Estimated tombstone drop times:%n
1510934467:      2475 * 2017.11.18
1510965112:       135
1510983500:       225
1511003962:       105
1511021113:      2280
1511037818:        30
1511055563:       120
----




Regards,
Takashima

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Monday, December 11, 2017 2:35 PM
To: user@cassandra.apache.org
Subject: Re: Tombstoned data seems to remain after compaction

Mutations read during boot won’t go into the memtable unless the mutation is in the commitlog (which usually means fairly recent - they’re a fixed size)

Are all of your writes TTL’d in this table?
Which compaction strategy are you using?
Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster?


--
Jeff Jirsa


On Dec 10, 2017, at 9:30 PM, "taka-t@fujitsu.com<ma...@fujitsu.com>" <ta...@fujitsu.com>> wrote:
Hi Kurt,


Thanks for your reply!

“””
The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone.
“””

Let me explain my understanding by example:


1.     A record inserted with 180 days TTL (Very long).

2.     The record is saved to SSTable (A) when the server restarts or some events like that.

3.     After 180 days pass, The Cassandra process read SSTable (A) on its boot process ( or, read access?) and put tombstone for the record on *Memory*.

4.     The tombstone on *Memory* is saved to SSTable (B) the next time the server is rebooted.

The procedure above splits the sstable for both the record per se and tombstone.

My understanding is correct?



Regards,
Takashima


From: kurt greaves [mailto:kurt@instaclustr.com]
Sent: Monday, December 11, 2017 1:46 PM
To: User <us...@cassandra.apache.org>>
Subject: Re: Tombstoned data seems to remain after compaction

The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone. For example the tombstone may be in that SSTable but some data the tombstone covers may possibly be in another SSTable. Only once all SSTables that contain relevant data have been compacted with the SSTable containing the tombstone can the tombstone be removed.

On 11 December 2017 at 01:08, taka-t@fujitsu.com<ma...@fujitsu.com> <ta...@fujitsu.com>> wrote:
Hi All,


I'm using the SSTable with Size Tired Compaction Strategy with
10 days gc grace period as default.

And sstablemetadata command shows Estimated tombstone drop times
As follows after minor compaction on 9th Dec, 2018.

(excerpt)
Estimated tombstone drop times:%n
1510934467:      2475 * 2017.11.18
1510965112:       135
1510983500:       225
1511003962:       105
1511021113:      2280
1511037818:        30
1511055563:       120
1511075445:       165


I just think there are records that should be deleted on
18th Nov, 2018 in the SSTable by the output above. My understanding
is correct?

If my understanding I correct, could someone tell me why those
expired data remains after compation?




Regards,
Takashima

----------------------------------------------------------------------
Toshiaki Takashima
Toyama Fujitsu Limited
+810764553131, ext. 7260292355

----------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<ma...@cassandra.apache.org>

Re: Tombstoned data seems to remain after compaction

Posted by Jeff Jirsa <jj...@gmail.com>.

Mutations read during boot won’t go into the memtable unless the mutation is in the commitlog (which usually means fairly recent - they’re a fixed size)

Are all of your writes TTL’d in this table?
Which compaction strategy are you using?
Are you asking these questions because you’re running out of space faster than you expect and you’d like to expire data faster? 


-- 
Jeff Jirsa


> On Dec 10, 2017, at 9:30 PM, "taka-t@fujitsu.com" <ta...@fujitsu.com> wrote:
> 
> Hi Kurt,
>  
>  
> Thanks for your reply!
>  
> “””
> The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone.
> “””
>  
> Let me explain my understanding by example:
>  
> 1.     A record inserted with 180 days TTL (Very long).
> 2.     The record is saved to SSTable (A) when the server restarts or some events like that.
> 3.     After 180 days pass, The Cassandra process read SSTable (A) on its boot process ( or, read access?) and put tombstone for the record on *Memory*.
> 4.     The tombstone on *Memory* is saved to SSTable (B) the next time the server is rebooted.
>  
> The procedure above splits the sstable for both the record per se and tombstone.
>  
> My understanding is correct?
>  
>  
>  
> Regards,
> Takashima
>  
>  
> From: kurt greaves [mailto:kurt@instaclustr.com] 
> Sent: Monday, December 11, 2017 1:46 PM
> To: User <us...@cassandra.apache.org>
> Subject: Re: Tombstoned data seems to remain after compaction
>  
> The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone. For example the tombstone may be in that SSTable but some data the tombstone covers may possibly be in another SSTable. Only once all SSTables that contain relevant data have been compacted with the SSTable containing the tombstone can the tombstone be removed.
>  
> On 11 December 2017 at 01:08, taka-t@fujitsu.com <ta...@fujitsu.com> wrote:
> Hi All,
> 
> 
> I'm using the SSTable with Size Tired Compaction Strategy with
> 10 days gc grace period as default.
> 
> And sstablemetadata command shows Estimated tombstone drop times
> As follows after minor compaction on 9th Dec, 2018.
> 
> (excerpt)
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> 1511075445:       165
> 
> 
> I just think there are records that should be deleted on
> 18th Nov, 2018 in the SSTable by the output above. My understanding
> is correct?
> 
> If my understanding I correct, could someone tell me why those
> expired data remains after compation?
> 
> 
> 
> 
> Regards,
> Takashima
> 
> ----------------------------------------------------------------------
> Toshiaki Takashima
> Toyama Fujitsu Limited
> +810764553131, ext. 7260292355
> 
> ----------------------------------------------------------------------
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
> 
>

RE: Tombstoned data seems to remain after compaction

Posted by "taka-t@fujitsu.com" <ta...@fujitsu.com>.

Hi Kurt,


Thanks for your reply!

“””
The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone.
“””

Let me explain my understanding by example:


1.     A record inserted with 180 days TTL (Very long).

2.     The record is saved to SSTable (A) when the server restarts or some events like that.

3.     After 180 days pass, The Cassandra process read SSTable (A) on its boot process ( or, read access?) and put tombstone for the record on *Memory*.

4.     The tombstone on *Memory* is saved to SSTable (B) the next time the server is rebooted.

The procedure above splits the sstable for both the record per se and tombstone.

My understanding is correct?



Regards,
Takashima


From: kurt greaves [mailto:kurt@instaclustr.com]
Sent: Monday, December 11, 2017 1:46 PM
To: User <us...@cassandra.apache.org>
Subject: Re: Tombstoned data seems to remain after compaction

The tombstone needs to compact with every SSTable that contains data for the corresponding tombstone. For example the tombstone may be in that SSTable but some data the tombstone covers may possibly be in another SSTable. Only once all SSTables that contain relevant data have been compacted with the SSTable containing the tombstone can the tombstone be removed.

On 11 December 2017 at 01:08, taka-t@fujitsu.com<ma...@fujitsu.com> <ta...@fujitsu.com>> wrote:
Hi All,


I'm using the SSTable with Size Tired Compaction Strategy with
10 days gc grace period as default.

And sstablemetadata command shows Estimated tombstone drop times
As follows after minor compaction on 9th Dec, 2018.

(excerpt)
Estimated tombstone drop times:%n
1510934467:      2475 * 2017.11.18
1510965112:       135
1510983500:       225
1511003962:       105
1511021113:      2280
1511037818:        30
1511055563:       120
1511075445:       165


I just think there are records that should be deleted on
18th Nov, 2018 in the SSTable by the output above. My understanding
is correct?

If my understanding I correct, could someone tell me why those
expired data remains after compation?




Regards,
Takashima

----------------------------------------------------------------------
Toshiaki Takashima
Toyama Fujitsu Limited
+810764553131, ext. 7260292355

----------------------------------------------------------------------



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org<ma...@cassandra.apache.org>
For additional commands, e-mail: user-help@cassandra.apache.org<ma...@cassandra.apache.org>

Re: Tombstoned data seems to remain after compaction

Posted by kurt greaves <ku...@instaclustr.com>.

The tombstone needs to compact with every SSTable that contains data for
the corresponding tombstone. For example the tombstone may be in that
SSTable but some data the tombstone covers may possibly be in another
SSTable. Only once all SSTables that contain relevant data have been
compacted with the SSTable containing the tombstone can the tombstone be
removed.

On 11 December 2017 at 01:08, taka-t@fujitsu.com <ta...@fujitsu.com> wrote:

> Hi All,
>
>
> I'm using the SSTable with Size Tired Compaction Strategy with
> 10 days gc grace period as default.
>
> And sstablemetadata command shows Estimated tombstone drop times
> As follows after minor compaction on 9th Dec, 2018.
>
> (excerpt)
> Estimated tombstone drop times:%n
> 1510934467:      2475 * 2017.11.18
> 1510965112:       135
> 1510983500:       225
> 1511003962:       105
> 1511021113:      2280
> 1511037818:        30
> 1511055563:       120
> 1511075445:       165
>
>
> I just think there are records that should be deleted on
> 18th Nov, 2018 in the SSTable by the output above. My understanding
> is correct?
>
> If my understanding I correct, could someone tell me why those
> expired data remains after compation?
>
>
>
>
> Regards,
> Takashima
>
> ----------------------------------------------------------------------
> Toshiaki Takashima
> Toyama Fujitsu Limited
> +810764553131, ext. 7260292355
>
> ----------------------------------------------------------------------
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@cassandra.apache.org
> For additional commands, e-mail: user-help@cassandra.apache.org
>
>