You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by "Lu, Boying" <Bo...@dell.com> on 2016/11/14 08:02:21 UTC

Some questions to updating and tombstone

Hi, All,

Will the Cassandra generates a new tombstone when updating a column by using CQL update statement?

And is there any way to get the number of tombstones of a column family since we want to void generating
too many tombstones within gc_grace_period?

Thanks

Boying

RE: Some questions to updating and tombstone

Posted by "Lu, Boying" <Bo...@dell.com>.
Very appreciate to all of you, I’ll study the blog.

From: Alain RODRIGUEZ [mailto:arodrime@gmail.com]
Sent: 2016年11月16日 23:26
To: user@cassandra.apache.org
Cc: Fabrice Facorat
Subject: Re: Some questions to updating and tombstone

Hi Boying,

Old value is not tombstone, but remains until compaction

Be careful, the above is generally true but not necessary.

Tombstones can actually be generated while using update in some corner cases. Using collections or prepared statements.

I wrote a detailed blog post about deletes and tombstones in Cassandra precisely to avoid answering this kind of question again and again on the mailing list, as explaining correctly is a bit hard and I am a lazy guy. I also talked about it at the last Cassandra summit. If you are going to use Cassandra (and deletes) I think one of these might be of interest to you:

http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
https://www.youtube.com/watch?v=lReTEcnzl7Y

If you still have questions after reading it, I would be very pleased to help you further, but I believe this should be helpful.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com<ma...@thelastpickle.com>
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-11-16 10:15 GMT+01:00 Shalom Sagges <sh...@liveperson.com>>:
Hi Fabrice,

Just a small (out of the topic) question I couldn't find an answer to. What is a slice in Cassandra? (e.g. Maximum tombstones per slice)

Thanks!


[Image removed by sender.]

Shalom Sagges

DBA

T: +972-74-700-4035<tel:%2B972-74-700-4035>

[Image removed by sender.]<http://www.linkedin.com/company/164748>

[Image removed by sender.]<http://twitter.com/liveperson>

[Image removed by sender.]<http://www.facebook.com/LivePersonInc>


We Create Meaningful Connections


[Image removed by sender.]<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>



On Tue, Nov 15, 2016 at 6:38 PM, Fabrice Facorat <fa...@gmail.com>> wrote:
If you don't want tombstones, don't generate them ;)
More seriously, tombstones are generated when:
- doing a DELETE
- TTL expiration
- set a column to NULL

However tombstones are an issue only if for the same value, you have many tombstones (i.e you keep overwriting the same values with datas and tombstones). Having 1 tombstone for 1 value is not an issue, having 1000 tombstone for 1 value is a problem. Do really your use case overwrite data with DELETE or  NULL ?
So that's why what you may want to know is how many tombstones you have on average when reading a value. This is available in:
- nodetool cfstats ks.cf<http://ks.cf> : Average tombstones per slice/Maximum tombstones per slice
- JMX : org.apache.cassandra.metrics:keyspace=<ks>,name=TombstoneScannedHistogram,scope=<cf>,type=ColumnFamily Max/Count/99thPercentile/Mean

2016-11-15 10:05 GMT+01:00 Lu, Boying <Bo...@dell.com>>:
Thanks a lot for your help.

We are using STCS strategy and not using TTL

Is there any API that we can use to query the current number of tombstones in a CF?



From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in<ma...@yahoo.co.in>]
Sent: 2016年11月14日 22:20
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: Some questions to updating and tombstone

Hi Boying,

I agree with Vladimir.If compaction is not compacting the two sstables with updates soon, disk space issues will be wasted. For example, if the updates are not closer in time, first update might be in a big table by the time second update is being written in a new small table. STCS wont compact them together soon.

Just adding column values with new timestamp shouldnt create any tombstones. But if data is not merged for long, disk space issues may arise. If you are STCS,just  yo get an idea about the extent of the problem you can run major compaction and see the amount of disk space created with that( dont do this in production as major compaction has its own side effects).

Which compaction strategy are you using?
Are these updates done with TTL?

Thanks
Anuj

On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
<vl...@winguzone.com>> wrote:
Hi Boying,

UPDATE write new value with new time stamp. Old value is not tombstone, but remains until compaction. gc_grace_period is not related to this.

Best regards, Vladimir Yudovin,
Winguzone<https://winguzone.com?from=list> - Hosted Cloud Cassandra
Launch your cluster in minutes.


---- On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying <Bo...@dell.com>> wrote ----

Hi, All,

Will the Cassandra generates a new tombstone when updating a column by using CQL update statement?

And is there any way to get the number of tombstones of a column family since we want to void generating
too many tombstones within gc_grace_period?

Thanks

Boying



--
Close the World, Open the Net
http://www.linux-wizard.net


This message may contain confidential and/or privileged information.
If you are not the addressee or authorized to receive this on behalf of the addressee you must not use, copy, disclose or take action based on this message or any information herein.
If you have received this message in error, please advise the sender immediately by reply email and delete this message. Thank you.


Re: Some questions to updating and tombstone

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi Boying,

Old value is not tombstone, but remains until compaction


Be careful, the above is generally true but not necessary.

Tombstones can actually be generated while using update in some corner
cases. Using collections or prepared statements.

I wrote a detailed blog post about deletes and tombstones in Cassandra
precisely to avoid answering this kind of question again and again on the
mailing list, as explaining correctly is a bit hard and I am a lazy guy. I
also talked about it at the last Cassandra summit. If you are going to use
Cassandra (and deletes) I think one of these might be of interest to you:

http://thelastpickle.com/blog/2016/07/27/about-deletes-and-tombstones.html
https://www.youtube.com/watch?v=lReTEcnzl7Y

If you still have questions after reading it, I would be very pleased to
help you further, but I believe this should be helpful.

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-11-16 10:15 GMT+01:00 Shalom Sagges <sh...@liveperson.com>:

> Hi Fabrice,
>
> Just a small (out of the topic) question I couldn't find an answer to.
> What is a slice in Cassandra? (e.g. Maximum tombstones per slice)
>
> Thanks!
>
>
> Shalom Sagges
> DBA
> T: +972-74-700-4035
> <http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
> <http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
>
> <https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>
>
>
> On Tue, Nov 15, 2016 at 6:38 PM, Fabrice Facorat <
> fabrice.facorat@gmail.com> wrote:
>
>> If you don't want tombstones, don't generate them ;)
>>
>> More seriously, tombstones are generated when:
>> - doing a DELETE
>> - TTL expiration
>> - set a column to NULL
>>
>> However tombstones are an issue only if for the same value, you have many
>> tombstones (i.e you keep overwriting the same values with datas and
>> tombstones). Having 1 tombstone for 1 value is not an issue, having 1000
>> tombstone for 1 value is a problem. Do really your use case overwrite data
>> with DELETE or  NULL ?
>>
>> So that's why what you may want to know is how many tombstones you have
>> on average when reading a value. This is available in:
>> - nodetool cfstats ks.cf : Average tombstones per slice/Maximum
>> tombstones per slice
>> - JMX : org.apache.cassandra.metrics:keyspace=<ks>,name=TombstoneSca
>> nnedHistogram,scope=<cf>,type=ColumnFamily Max/Count/99thPercentile/Mean
>>
>>
>> 2016-11-15 10:05 GMT+01:00 Lu, Boying <Bo...@dell.com>:
>>
>>> Thanks a lot for your help.
>>>
>>>
>>>
>>> We are using STCS strategy and not using TTL
>>>
>>>
>>>
>>> Is there any API that we can use to query the current number of
>>> tombstones in a CF?
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From:* Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
>>> *Sent:* 2016年11月14日 22:20
>>> *To:* user@cassandra.apache.org
>>> *Subject:* Re: Some questions to updating and tombstone
>>>
>>>
>>>
>>> Hi Boying,
>>>
>>>
>>>
>>> I agree with Vladimir.If compaction is not compacting the two sstables
>>> with updates soon, disk space issues will be wasted. For example, if the
>>> updates are not closer in time, first update might be in a big table by the
>>> time second update is being written in a new small table. STCS wont compact
>>> them together soon.
>>>
>>>
>>>
>>> Just adding column values with new timestamp shouldnt create any
>>> tombstones. But if data is not merged for long, disk space issues may
>>> arise. If you are STCS,just  yo get an idea about the extent of the problem
>>> you can run major compaction and see the amount of disk space created with
>>> that( dont do this in production as major compaction has its own side
>>> effects).
>>>
>>>
>>>
>>> Which compaction strategy are you using?
>>>
>>> Are these updates done with TTL?
>>>
>>>
>>>
>>> Thanks
>>> Anuj
>>>
>>>
>>>
>>> On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
>>>
>>> <vl...@winguzone.com> wrote:
>>>
>>> Hi Boying,
>>>
>>>
>>>
>>> UPDATE write new value with new time stamp. Old value is not tombstone,
>>> but remains until compaction. gc_grace_period is not related to this.
>>>
>>>
>>>
>>> Best regards, Vladimir Yudovin,
>>>
>>>
>>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>>> Launch your cluster in minutes.*
>>>
>>>
>>>
>>>
>>>
>>> ---- On Mon, 14 Nov 2016 03:02:21 -0500*Lu, Boying <Boying.Lu@dell.com
>>> <Bo...@dell.com>>* wrote ----
>>>
>>>
>>>
>>> Hi, All,
>>>
>>>
>>>
>>> Will the Cassandra generates a new tombstone when updating a column by
>>> using CQL update statement?
>>>
>>>
>>>
>>> And is there any way to get the number of tombstones of a column family
>>> since we want to void generating
>>>
>>> too many tombstones within gc_grace_period?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> Boying
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Close the World, Open the Net
>> http://www.linux-wizard.net
>>
>
>
> This message may contain confidential and/or privileged information.
> If you are not the addressee or authorized to receive this on behalf of
> the addressee you must not use, copy, disclose or take action based on this
> message or any information herein.
> If you have received this message in error, please advise the sender
> immediately by reply email and delete this message. Thank you.
>

Re: Some questions to updating and tombstone

Posted by Shalom Sagges <sh...@liveperson.com>.
Hi Fabrice,

Just a small (out of the topic) question I couldn't find an answer to. What
is a slice in Cassandra? (e.g. Maximum tombstones per slice)

Thanks!


Shalom Sagges
DBA
T: +972-74-700-4035
<http://www.linkedin.com/company/164748> <http://twitter.com/liveperson>
<http://www.facebook.com/LivePersonInc> We Create Meaningful Connections
<https://engage.liveperson.com/idc-mobile-first-consumer/?utm_medium=email&utm_source=mkto&utm_campaign=idcsig>


On Tue, Nov 15, 2016 at 6:38 PM, Fabrice Facorat <fa...@gmail.com>
wrote:

> If you don't want tombstones, don't generate them ;)
>
> More seriously, tombstones are generated when:
> - doing a DELETE
> - TTL expiration
> - set a column to NULL
>
> However tombstones are an issue only if for the same value, you have many
> tombstones (i.e you keep overwriting the same values with datas and
> tombstones). Having 1 tombstone for 1 value is not an issue, having 1000
> tombstone for 1 value is a problem. Do really your use case overwrite data
> with DELETE or  NULL ?
>
> So that's why what you may want to know is how many tombstones you have on
> average when reading a value. This is available in:
> - nodetool cfstats ks.cf : Average tombstones per slice/Maximum
> tombstones per slice
> - JMX : org.apache.cassandra.metrics:keyspace=<ks>,name=
> TombstoneScannedHistogram,scope=<cf>,type=ColumnFamily
> Max/Count/99thPercentile/Mean
>
>
> 2016-11-15 10:05 GMT+01:00 Lu, Boying <Bo...@dell.com>:
>
>> Thanks a lot for your help.
>>
>>
>>
>> We are using STCS strategy and not using TTL
>>
>>
>>
>> Is there any API that we can use to query the current number of
>> tombstones in a CF?
>>
>>
>>
>>
>>
>>
>>
>> *From:* Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
>> *Sent:* 2016年11月14日 22:20
>> *To:* user@cassandra.apache.org
>> *Subject:* Re: Some questions to updating and tombstone
>>
>>
>>
>> Hi Boying,
>>
>>
>>
>> I agree with Vladimir.If compaction is not compacting the two sstables
>> with updates soon, disk space issues will be wasted. For example, if the
>> updates are not closer in time, first update might be in a big table by the
>> time second update is being written in a new small table. STCS wont compact
>> them together soon.
>>
>>
>>
>> Just adding column values with new timestamp shouldnt create any
>> tombstones. But if data is not merged for long, disk space issues may
>> arise. If you are STCS,just  yo get an idea about the extent of the problem
>> you can run major compaction and see the amount of disk space created with
>> that( dont do this in production as major compaction has its own side
>> effects).
>>
>>
>>
>> Which compaction strategy are you using?
>>
>> Are these updates done with TTL?
>>
>>
>>
>> Thanks
>> Anuj
>>
>>
>>
>> On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
>>
>> <vl...@winguzone.com> wrote:
>>
>> Hi Boying,
>>
>>
>>
>> UPDATE write new value with new time stamp. Old value is not tombstone,
>> but remains until compaction. gc_grace_period is not related to this.
>>
>>
>>
>> Best regards, Vladimir Yudovin,
>>
>>
>> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
>> Launch your cluster in minutes.*
>>
>>
>>
>>
>>
>> ---- On Mon, 14 Nov 2016 03:02:21 -0500*Lu, Boying <Boying.Lu@dell.com
>> <Bo...@dell.com>>* wrote ----
>>
>>
>>
>> Hi, All,
>>
>>
>>
>> Will the Cassandra generates a new tombstone when updating a column by
>> using CQL update statement?
>>
>>
>>
>> And is there any way to get the number of tombstones of a column family
>> since we want to void generating
>>
>> too many tombstones within gc_grace_period?
>>
>>
>>
>> Thanks
>>
>>
>>
>> Boying
>>
>>
>>
>>
>
>
> --
> Close the World, Open the Net
> http://www.linux-wizard.net
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.

Re: Some questions to updating and tombstone

Posted by Fabrice Facorat <fa...@gmail.com>.
If you don't want tombstones, don't generate them ;)

More seriously, tombstones are generated when:
- doing a DELETE
- TTL expiration
- set a column to NULL

However tombstones are an issue only if for the same value, you have many
tombstones (i.e you keep overwriting the same values with datas and
tombstones). Having 1 tombstone for 1 value is not an issue, having 1000
tombstone for 1 value is a problem. Do really your use case overwrite data
with DELETE or  NULL ?

So that's why what you may want to know is how many tombstones you have on
average when reading a value. This is available in:
- nodetool cfstats ks.cf : Average tombstones per slice/Maximum tombstones
per slice
- JMX :
org.apache.cassandra.metrics:keyspace=<ks>,name=TombstoneScannedHistogram,scope=<cf>,type=ColumnFamily
Max/Count/99thPercentile/Mean


2016-11-15 10:05 GMT+01:00 Lu, Boying <Bo...@dell.com>:

> Thanks a lot for your help.
>
>
>
> We are using STCS strategy and not using TTL
>
>
>
> Is there any API that we can use to query the current number of tombstones
> in a CF?
>
>
>
>
>
>
>
> *From:* Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
> *Sent:* 2016年11月14日 22:20
> *To:* user@cassandra.apache.org
> *Subject:* Re: Some questions to updating and tombstone
>
>
>
> Hi Boying,
>
>
>
> I agree with Vladimir.If compaction is not compacting the two sstables
> with updates soon, disk space issues will be wasted. For example, if the
> updates are not closer in time, first update might be in a big table by the
> time second update is being written in a new small table. STCS wont compact
> them together soon.
>
>
>
> Just adding column values with new timestamp shouldnt create any
> tombstones. But if data is not merged for long, disk space issues may
> arise. If you are STCS,just  yo get an idea about the extent of the problem
> you can run major compaction and see the amount of disk space created with
> that( dont do this in production as major compaction has its own side
> effects).
>
>
>
> Which compaction strategy are you using?
>
> Are these updates done with TTL?
>
>
>
> Thanks
> Anuj
>
>
>
> On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
>
> <vl...@winguzone.com> wrote:
>
> Hi Boying,
>
>
>
> UPDATE write new value with new time stamp. Old value is not tombstone,
> but remains until compaction. gc_grace_period is not related to this.
>
>
>
> Best regards, Vladimir Yudovin,
>
>
> *Winguzone <https://winguzone.com?from=list> - Hosted Cloud Cassandra
> Launch your cluster in minutes.*
>
>
>
>
>
> ---- On Mon, 14 Nov 2016 03:02:21 -0500*Lu, Boying <Boying.Lu@dell.com
> <Bo...@dell.com>>* wrote ----
>
>
>
> Hi, All,
>
>
>
> Will the Cassandra generates a new tombstone when updating a column by
> using CQL update statement?
>
>
>
> And is there any way to get the number of tombstones of a column family
> since we want to void generating
>
> too many tombstones within gc_grace_period?
>
>
>
> Thanks
>
>
>
> Boying
>
>
>
>


-- 
Close the World, Open the Net
http://www.linux-wizard.net

RE: Some questions to updating and tombstone

Posted by "Lu, Boying" <Bo...@dell.com>.
Thanks a lot for your help.

We are using STCS strategy and not using TTL

Is there any API that we can use to query the current number of tombstones in a CF?



From: Anuj Wadehra [mailto:anujw_2003@yahoo.co.in]
Sent: 2016年11月14日 22:20
To: user@cassandra.apache.org
Subject: Re: Some questions to updating and tombstone

Hi Boying,

I agree with Vladimir.If compaction is not compacting the two sstables with updates soon, disk space issues will be wasted. For example, if the updates are not closer in time, first update might be in a big table by the time second update is being written in a new small table. STCS wont compact them together soon.

Just adding column values with new timestamp shouldnt create any tombstones. But if data is not merged for long, disk space issues may arise. If you are STCS,just  yo get an idea about the extent of the problem you can run major compaction and see the amount of disk space created with that( dont do this in production as major compaction has its own side effects).

Which compaction strategy are you using?
Are these updates done with TTL?

Thanks
Anuj

On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin
<vl...@winguzone.com>> wrote:
Hi Boying,

UPDATE write new value with new time stamp. Old value is not tombstone, but remains until compaction. gc_grace_period is not related to this.

Best regards, Vladimir Yudovin,
Winguzone<https://winguzone.com?from=list> - Hosted Cloud Cassandra
Launch your cluster in minutes.


---- On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying <Bo...@dell.com>> wrote ----

Hi, All,

Will the Cassandra generates a new tombstone when updating a column by using CQL update statement?

And is there any way to get the number of tombstones of a column family since we want to void generating
too many tombstones within gc_grace_period?

Thanks

Boying


Re: Some questions to updating and tombstone

Posted by Anuj Wadehra <an...@yahoo.co.in>.
Hi Boying,
I agree with Vladimir.If compaction is not compacting the two sstables with updates soon, disk space issues will be wasted. For example, if the updates are not closer in time, first update might be in a big table by the time second update is being written in a new small table. STCS wont compact them together soon.
Just adding column values with new timestamp shouldnt create any tombstones. But if data is not merged for long, disk space issues may arise. If you are STCS,just  yo get an idea about the extent of the problem you can run major compaction and see the amount of disk space created with that( dont do this in production as major compaction has its own side effects).
Which compaction strategy are you using? Are these updates done with TTL?
Thanks
Anuj 
 
  On Mon, 14 Nov, 2016 at 1:54 PM, Vladimir Yudovin<vl...@winguzone.com> wrote:   Hi Boying,

UPDATE write new value with new time stamp. Old value is not tombstone, but remains until compaction. gc_grace_period is not related to this.

Best regards, Vladimir Yudovin, 
Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.

---- On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying <Bo...@dell.com> wrote ----



Hi, All,


 


Will the Cassandra generates a new tombstone when updating a column by using CQL update statement?


 


And is there any way to get the number of tombstones of a column family since we want to void generating


too many tombstones within gc_grace_period?


 


Thanks


 


Boying



  

Re: Some questions to updating and tombstone

Posted by Vladimir Yudovin <vl...@winguzone.com>.
Hi Boying,



UPDATE write new value with new time stamp. Old value is not tombstone, but remains until compaction. gc_grace_period is not related to this.



Best regards, Vladimir Yudovin, 

Winguzone - Hosted Cloud Cassandra
Launch your cluster in minutes.





---- On Mon, 14 Nov 2016 03:02:21 -0500Lu, Boying &lt;Boying.Lu@dell.com&gt; wrote ----




Hi, All,

 

Will the Cassandra generates a new tombstone when updating a column by using CQL update statement?

 

And is there any way to get the number of tombstones of a column family since we want to void generating

too many tombstones within gc_grace_period?

 

Thanks

 

Boying