You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Analia Lorenzatto <an...@gmail.com> on 2015/08/19 22:54:02 UTC

Question about how to remove data

Hello guys,

I have a cassandra cluster 2.1 comprised of 4 nodes.

I removed a lot of data in a Column Family, then I ran manually a
compaction on this Column family on every node.   After doing that, If I
query that data, cassandra correctly says this data is not there.  But the
space on disk is exactly the same before removing that data.

Also, I realized that  gc_grace_seconds = 0.  Some people on the internet
say that it could produce zombie data, what do you think?

I do not have a TTL defined on the Column family, and I do not have the
possibility to create it.   So my questions is, given that I do not have a
TTL defined is data going to be removed?  or the deleted data is never
actually going to be deleted due to I do not have a TTL?


Thanks in advance!

-- 
Saludos / Regards.

Analía Lorenzatto.

“It's possible to commit no errors and still lose. That is not weakness.
That is life".  By Captain Jean-Luc Picard.

Re: Question about how to remove data

Posted by Analia Lorenzatto <an...@gmail.com>.
Thanks guys for the answers!

Saludos / Regards.

Analía Lorenzatto.

"Hapiness is not something really made. It comes from your own actions" by
Dalai Lama


On 21 Aug 2015 2:31 pm, "Sebastian Estevez" <se...@datastax.com>
wrote:

> To clarify, you do not need a ttl for deletes to be compacted away in
> Cassandra. When you delete, we create a tombstone which will remain in the
> system __at least__ gc grace seconds. We wait this long to give the
> tombstone a chance to make it to all replica nodes, the best practice is to
> run repairs as often as gc grace seconds in order to ensure edge cases
> where data comes back to life (i.e. the tombstone was never sent to one of
> your replicas and when the tombstones and data are removed from the other
> two replicas, all that is left is the old value.
>
> __at least__ are the key words in the previous paragraph, there are more
> conditions that need to be met in order for a tombstone to actually get
> cleaned up. As most things in Cassandra, these conditions are configurable
> (via the following compaction sub-properties):
>
>
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_configure_compaction_t.html
>
> All the best,
>
>
> [image: datastax_logo.png] <http://www.datastax.com/>
>
> Sebastián Estévez
>
> Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com
>
> [image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
> facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
> <https://twitter.com/datastax> [image: g+.png]
> <https://plus.google.com/+Datastax/about>
> <http://feeds.feedburner.com/datastax>
>
>
> <http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
> On Thu, Aug 20, 2015 at 4:13 PM, Daniel Chia <da...@coursera.org> wrote:
>
>> The TTL shouldn't matter if you deleted the data, since to my
>> understanding the delete should shadow the data signaling to C* that the
>> data is a candidate for removal on compaction.
>>
>> Others might know better, but it could very well be the fact that
>> gc_grace_seconds is 0 that is causing your problems. Others might have
>> other suggestions, but you could potentially use sstable2json to see the
>> raw contents of the sstable on disk and see why data is still there.
>>
>> Thanks,
>> Daniel
>>
>> On Thu, Aug 20, 2015 at 12:55 PM, Analia Lorenzatto <
>> analialorenzatto@gmail.com> wrote:
>>
>>> Hello,
>>>
>>> Daniel, I am using Size Tiered compaction.
>>>
>>> My concern is that as I do not have a TTL defined on the Column family,
>>> and I do not have the possibility to create it.   Perhaps, the "deleted
>>> data" is never actually going to be removed?
>>>
>>> Thanks a lot!
>>>
>>>
>>> On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia <da...@coursera.org>
>>> wrote:
>>>
>>>> Is this a LCS family, or Size Tiered? Manually running compaction on
>>>> LCS doesn't do anything until C* 2.2 (
>>>> https://issues.apache.org/jira/browse/CASSANDRA-7272)
>>>>
>>>> Thanks,
>>>> Daniel
>>>>
>>>> On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto <
>>>> analialorenzatto@gmail.com> wrote:
>>>>
>>>>> Hello Michael,
>>>>>
>>>>> Thanks for responding!
>>>>>
>>>>> I do not have snapshots on any node of the cluster.
>>>>>
>>>>> Saludos / Regards.
>>>>>
>>>>> Analía Lorenzatto.
>>>>>
>>>>> "Hapiness is not something really made. It comes from your own
>>>>> actions" by Dalai Lama
>>>>>
>>>>>
>>>>> On 19 Aug 2015 6:19 pm, "Laing, Michael" <mi...@nytimes.com>
>>>>> wrote:
>>>>>
>>>>>> Possibly you have snapshots? If so, use nodetool to clear them.
>>>>>>
>>>>>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
>>>>>> analialorenzatto@gmail.com> wrote:
>>>>>>
>>>>>>> Hello guys,
>>>>>>>
>>>>>>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>>>>>>
>>>>>>> I removed a lot of data in a Column Family, then I ran manually a
>>>>>>> compaction on this Column family on every node.   After doing that, If I
>>>>>>> query that data, cassandra correctly says this data is not there.  But the
>>>>>>> space on disk is exactly the same before removing that data.
>>>>>>>
>>>>>>> Also, I realized that  gc_grace_seconds = 0.  Some people on the
>>>>>>> internet say that it could produce zombie data, what do you think?
>>>>>>>
>>>>>>> I do not have a TTL defined on the Column family, and I do not have
>>>>>>> the possibility to create it.   So my questions is, given that I do not
>>>>>>> have a TTL defined is data going to be removed?  or the deleted data is
>>>>>>> never actually going to be deleted due to I do not have a TTL?
>>>>>>>
>>>>>>>
>>>>>>> Thanks in advance!
>>>>>>>
>>>>>>> --
>>>>>>> Saludos / Regards.
>>>>>>>
>>>>>>> Analía Lorenzatto.
>>>>>>>
>>>>>>> “It's possible to commit no errors and still lose. That is not
>>>>>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>>
>>> --
>>> Saludos / Regards.
>>>
>>> Analía Lorenzatto.
>>>
>>> “It's possible to commit no errors and still lose. That is not
>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>
>>
>>
>

Re: Question about how to remove data

Posted by Sebastian Estevez <se...@datastax.com>.
To clarify, you do not need a ttl for deletes to be compacted away in
Cassandra. When you delete, we create a tombstone which will remain in the
system __at least__ gc grace seconds. We wait this long to give the
tombstone a chance to make it to all replica nodes, the best practice is to
run repairs as often as gc grace seconds in order to ensure edge cases
where data comes back to life (i.e. the tombstone was never sent to one of
your replicas and when the tombstones and data are removed from the other
two replicas, all that is left is the old value.

__at least__ are the key words in the previous paragraph, there are more
conditions that need to be met in order for a tombstone to actually get
cleaned up. As most things in Cassandra, these conditions are configurable
(via the following compaction sub-properties):

http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_configure_compaction_t.html

All the best,


[image: datastax_logo.png] <http://www.datastax.com/>

Sebastián Estévez

Solutions Architect | 954 905 8615 | sebastian.estevez@datastax.com

[image: linkedin.png] <https://www.linkedin.com/company/datastax> [image:
facebook.png] <https://www.facebook.com/datastax> [image: twitter.png]
<https://twitter.com/datastax> [image: g+.png]
<https://plus.google.com/+Datastax/about>
<http://feeds.feedburner.com/datastax>

<http://cassandrasummit-datastax.com/?utm_campaign=summit15&utm_medium=summiticon&utm_source=emailsignature>

DataStax is the fastest, most scalable distributed database technology,
delivering Apache Cassandra to the world’s most innovative enterprises.
Datastax is built to be agile, always-on, and predictably scalable to any
size. With more than 500 customers in 45 countries, DataStax is the
database technology and transactional backbone of choice for the worlds
most innovative companies such as Netflix, Adobe, Intuit, and eBay.

On Thu, Aug 20, 2015 at 4:13 PM, Daniel Chia <da...@coursera.org> wrote:

> The TTL shouldn't matter if you deleted the data, since to my
> understanding the delete should shadow the data signaling to C* that the
> data is a candidate for removal on compaction.
>
> Others might know better, but it could very well be the fact that
> gc_grace_seconds is 0 that is causing your problems. Others might have
> other suggestions, but you could potentially use sstable2json to see the
> raw contents of the sstable on disk and see why data is still there.
>
> Thanks,
> Daniel
>
> On Thu, Aug 20, 2015 at 12:55 PM, Analia Lorenzatto <
> analialorenzatto@gmail.com> wrote:
>
>> Hello,
>>
>> Daniel, I am using Size Tiered compaction.
>>
>> My concern is that as I do not have a TTL defined on the Column family,
>> and I do not have the possibility to create it.   Perhaps, the "deleted
>> data" is never actually going to be removed?
>>
>> Thanks a lot!
>>
>>
>> On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia <da...@coursera.org>
>> wrote:
>>
>>> Is this a LCS family, or Size Tiered? Manually running compaction on LCS
>>> doesn't do anything until C* 2.2 (
>>> https://issues.apache.org/jira/browse/CASSANDRA-7272)
>>>
>>> Thanks,
>>> Daniel
>>>
>>> On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto <
>>> analialorenzatto@gmail.com> wrote:
>>>
>>>> Hello Michael,
>>>>
>>>> Thanks for responding!
>>>>
>>>> I do not have snapshots on any node of the cluster.
>>>>
>>>> Saludos / Regards.
>>>>
>>>> Analía Lorenzatto.
>>>>
>>>> "Hapiness is not something really made. It comes from your own actions"
>>>> by Dalai Lama
>>>>
>>>>
>>>> On 19 Aug 2015 6:19 pm, "Laing, Michael" <mi...@nytimes.com>
>>>> wrote:
>>>>
>>>>> Possibly you have snapshots? If so, use nodetool to clear them.
>>>>>
>>>>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
>>>>> analialorenzatto@gmail.com> wrote:
>>>>>
>>>>>> Hello guys,
>>>>>>
>>>>>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>>>>>
>>>>>> I removed a lot of data in a Column Family, then I ran manually a
>>>>>> compaction on this Column family on every node.   After doing that, If I
>>>>>> query that data, cassandra correctly says this data is not there.  But the
>>>>>> space on disk is exactly the same before removing that data.
>>>>>>
>>>>>> Also, I realized that  gc_grace_seconds = 0.  Some people on the
>>>>>> internet say that it could produce zombie data, what do you think?
>>>>>>
>>>>>> I do not have a TTL defined on the Column family, and I do not have
>>>>>> the possibility to create it.   So my questions is, given that I do not
>>>>>> have a TTL defined is data going to be removed?  or the deleted data is
>>>>>> never actually going to be deleted due to I do not have a TTL?
>>>>>>
>>>>>>
>>>>>> Thanks in advance!
>>>>>>
>>>>>> --
>>>>>> Saludos / Regards.
>>>>>>
>>>>>> Analía Lorenzatto.
>>>>>>
>>>>>> “It's possible to commit no errors and still lose. That is not
>>>>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>>>>
>>>>>
>>>>>
>>>
>>
>>
>> --
>> Saludos / Regards.
>>
>> Analía Lorenzatto.
>>
>> “It's possible to commit no errors and still lose. That is not weakness.
>> That is life".  By Captain Jean-Luc Picard.
>>
>
>

Re: Question about how to remove data

Posted by Daniel Chia <da...@coursera.org>.
The TTL shouldn't matter if you deleted the data, since to my understanding
the delete should shadow the data signaling to C* that the data is a
candidate for removal on compaction.

Others might know better, but it could very well be the fact that
gc_grace_seconds is 0 that is causing your problems. Others might have
other suggestions, but you could potentially use sstable2json to see the
raw contents of the sstable on disk and see why data is still there.

Thanks,
Daniel

On Thu, Aug 20, 2015 at 12:55 PM, Analia Lorenzatto <
analialorenzatto@gmail.com> wrote:

> Hello,
>
> Daniel, I am using Size Tiered compaction.
>
> My concern is that as I do not have a TTL defined on the Column family,
> and I do not have the possibility to create it.   Perhaps, the "deleted
> data" is never actually going to be removed?
>
> Thanks a lot!
>
>
> On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia <da...@coursera.org> wrote:
>
>> Is this a LCS family, or Size Tiered? Manually running compaction on LCS
>> doesn't do anything until C* 2.2 (
>> https://issues.apache.org/jira/browse/CASSANDRA-7272)
>>
>> Thanks,
>> Daniel
>>
>> On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto <
>> analialorenzatto@gmail.com> wrote:
>>
>>> Hello Michael,
>>>
>>> Thanks for responding!
>>>
>>> I do not have snapshots on any node of the cluster.
>>>
>>> Saludos / Regards.
>>>
>>> Analía Lorenzatto.
>>>
>>> "Hapiness is not something really made. It comes from your own actions"
>>> by Dalai Lama
>>>
>>>
>>> On 19 Aug 2015 6:19 pm, "Laing, Michael" <mi...@nytimes.com>
>>> wrote:
>>>
>>>> Possibly you have snapshots? If so, use nodetool to clear them.
>>>>
>>>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
>>>> analialorenzatto@gmail.com> wrote:
>>>>
>>>>> Hello guys,
>>>>>
>>>>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>>>>
>>>>> I removed a lot of data in a Column Family, then I ran manually a
>>>>> compaction on this Column family on every node.   After doing that, If I
>>>>> query that data, cassandra correctly says this data is not there.  But the
>>>>> space on disk is exactly the same before removing that data.
>>>>>
>>>>> Also, I realized that  gc_grace_seconds = 0.  Some people on the
>>>>> internet say that it could produce zombie data, what do you think?
>>>>>
>>>>> I do not have a TTL defined on the Column family, and I do not have
>>>>> the possibility to create it.   So my questions is, given that I do not
>>>>> have a TTL defined is data going to be removed?  or the deleted data is
>>>>> never actually going to be deleted due to I do not have a TTL?
>>>>>
>>>>>
>>>>> Thanks in advance!
>>>>>
>>>>> --
>>>>> Saludos / Regards.
>>>>>
>>>>> Analía Lorenzatto.
>>>>>
>>>>> “It's possible to commit no errors and still lose. That is not
>>>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>>>
>>>>
>>>>
>>
>
>
> --
> Saludos / Regards.
>
> Analía Lorenzatto.
>
> “It's possible to commit no errors and still lose. That is not weakness.
> That is life".  By Captain Jean-Luc Picard.
>

Re: Question about how to remove data

Posted by Analia Lorenzatto <an...@gmail.com>.
Hello,

Daniel, I am using Size Tiered compaction.

My concern is that as I do not have a TTL defined on the Column family, and
I do not have the possibility to create it.   Perhaps, the "deleted data"
is never actually going to be removed?

Thanks a lot!


On Thu, Aug 20, 2015 at 4:24 AM, Daniel Chia <da...@coursera.org> wrote:

> Is this a LCS family, or Size Tiered? Manually running compaction on LCS
> doesn't do anything until C* 2.2 (
> https://issues.apache.org/jira/browse/CASSANDRA-7272)
>
> Thanks,
> Daniel
>
> On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto <
> analialorenzatto@gmail.com> wrote:
>
>> Hello Michael,
>>
>> Thanks for responding!
>>
>> I do not have snapshots on any node of the cluster.
>>
>> Saludos / Regards.
>>
>> Analía Lorenzatto.
>>
>> "Hapiness is not something really made. It comes from your own actions"
>> by Dalai Lama
>>
>>
>> On 19 Aug 2015 6:19 pm, "Laing, Michael" <mi...@nytimes.com>
>> wrote:
>>
>>> Possibly you have snapshots? If so, use nodetool to clear them.
>>>
>>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
>>> analialorenzatto@gmail.com> wrote:
>>>
>>>> Hello guys,
>>>>
>>>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>>>
>>>> I removed a lot of data in a Column Family, then I ran manually a
>>>> compaction on this Column family on every node.   After doing that, If I
>>>> query that data, cassandra correctly says this data is not there.  But the
>>>> space on disk is exactly the same before removing that data.
>>>>
>>>> Also, I realized that  gc_grace_seconds = 0.  Some people on the
>>>> internet say that it could produce zombie data, what do you think?
>>>>
>>>> I do not have a TTL defined on the Column family, and I do not have the
>>>> possibility to create it.   So my questions is, given that I do not have a
>>>> TTL defined is data going to be removed?  or the deleted data is never
>>>> actually going to be deleted due to I do not have a TTL?
>>>>
>>>>
>>>> Thanks in advance!
>>>>
>>>> --
>>>> Saludos / Regards.
>>>>
>>>> Analía Lorenzatto.
>>>>
>>>> “It's possible to commit no errors and still lose. That is not
>>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>>
>>>
>>>
>


-- 
Saludos / Regards.

Analía Lorenzatto.

“It's possible to commit no errors and still lose. That is not weakness.
That is life".  By Captain Jean-Luc Picard.

Re: Question about how to remove data

Posted by Daniel Chia <da...@coursera.org>.
Is this a LCS family, or Size Tiered? Manually running compaction on LCS
doesn't do anything until C* 2.2 (
https://issues.apache.org/jira/browse/CASSANDRA-7272)

Thanks,
Daniel

On Wed, Aug 19, 2015 at 6:56 PM, Analia Lorenzatto <
analialorenzatto@gmail.com> wrote:

> Hello Michael,
>
> Thanks for responding!
>
> I do not have snapshots on any node of the cluster.
>
> Saludos / Regards.
>
> Analía Lorenzatto.
>
> "Hapiness is not something really made. It comes from your own actions" by
> Dalai Lama
>
>
> On 19 Aug 2015 6:19 pm, "Laing, Michael" <mi...@nytimes.com>
> wrote:
>
>> Possibly you have snapshots? If so, use nodetool to clear them.
>>
>> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
>> analialorenzatto@gmail.com> wrote:
>>
>>> Hello guys,
>>>
>>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>>
>>> I removed a lot of data in a Column Family, then I ran manually a
>>> compaction on this Column family on every node.   After doing that, If I
>>> query that data, cassandra correctly says this data is not there.  But the
>>> space on disk is exactly the same before removing that data.
>>>
>>> Also, I realized that  gc_grace_seconds = 0.  Some people on the
>>> internet say that it could produce zombie data, what do you think?
>>>
>>> I do not have a TTL defined on the Column family, and I do not have the
>>> possibility to create it.   So my questions is, given that I do not have a
>>> TTL defined is data going to be removed?  or the deleted data is never
>>> actually going to be deleted due to I do not have a TTL?
>>>
>>>
>>> Thanks in advance!
>>>
>>> --
>>> Saludos / Regards.
>>>
>>> Analía Lorenzatto.
>>>
>>> “It's possible to commit no errors and still lose. That is not
>>> weakness.  That is life".  By Captain Jean-Luc Picard.
>>>
>>
>>

Re: Question about how to remove data

Posted by Analia Lorenzatto <an...@gmail.com>.
Hello Michael,

Thanks for responding!

I do not have snapshots on any node of the cluster.

Saludos / Regards.

Analía Lorenzatto.

"Hapiness is not something really made. It comes from your own actions" by
Dalai Lama


On 19 Aug 2015 6:19 pm, "Laing, Michael" <mi...@nytimes.com> wrote:

> Possibly you have snapshots? If so, use nodetool to clear them.
>
> On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
> analialorenzatto@gmail.com> wrote:
>
>> Hello guys,
>>
>> I have a cassandra cluster 2.1 comprised of 4 nodes.
>>
>> I removed a lot of data in a Column Family, then I ran manually a
>> compaction on this Column family on every node.   After doing that, If I
>> query that data, cassandra correctly says this data is not there.  But the
>> space on disk is exactly the same before removing that data.
>>
>> Also, I realized that  gc_grace_seconds = 0.  Some people on the internet
>> say that it could produce zombie data, what do you think?
>>
>> I do not have a TTL defined on the Column family, and I do not have the
>> possibility to create it.   So my questions is, given that I do not have a
>> TTL defined is data going to be removed?  or the deleted data is never
>> actually going to be deleted due to I do not have a TTL?
>>
>>
>> Thanks in advance!
>>
>> --
>> Saludos / Regards.
>>
>> Analía Lorenzatto.
>>
>> “It's possible to commit no errors and still lose. That is not weakness.
>> That is life".  By Captain Jean-Luc Picard.
>>
>
>

Re: Question about how to remove data

Posted by "Laing, Michael" <mi...@nytimes.com>.
Possibly you have snapshots? If so, use nodetool to clear them.

On Wed, Aug 19, 2015 at 4:54 PM, Analia Lorenzatto <
analialorenzatto@gmail.com> wrote:

> Hello guys,
>
> I have a cassandra cluster 2.1 comprised of 4 nodes.
>
> I removed a lot of data in a Column Family, then I ran manually a
> compaction on this Column family on every node.   After doing that, If I
> query that data, cassandra correctly says this data is not there.  But the
> space on disk is exactly the same before removing that data.
>
> Also, I realized that  gc_grace_seconds = 0.  Some people on the internet
> say that it could produce zombie data, what do you think?
>
> I do not have a TTL defined on the Column family, and I do not have the
> possibility to create it.   So my questions is, given that I do not have a
> TTL defined is data going to be removed?  or the deleted data is never
> actually going to be deleted due to I do not have a TTL?
>
>
> Thanks in advance!
>
> --
> Saludos / Regards.
>
> Analía Lorenzatto.
>
> “It's possible to commit no errors and still lose. That is not weakness.
> That is life".  By Captain Jean-Luc Picard.
>