You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Roman Tkachenko <ro...@mailgunhq.com> on 2015/03/24 00:45:57 UTC

Deleted columns reappear after "repair"

Hey guys,

We're having a very strange issue: deleted columns get resurrected when
"repair" is run on a node.

Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in
one datacenter and 6 nodes in another one. Schema:

cqlsh> describe keyspace blackbook;

CREATE KEYSPACE blackbook WITH replication = {
  'class': 'NetworkTopologyStrategy',
  'IAD': '3',
  'ORD': '3'
};

USE blackbook;

CREATE TABLE bounces (
  domainid text,
  address text,
  message text,
  "timestamp" bigint,
  PRIMARY KEY (domainid, address)
) WITH
  bloom_filter_fp_chance=0.100000 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.100000 AND
  gc_grace_seconds=864000 AND
  index_interval=128 AND
  read_repair_chance=0.000000 AND
  populate_io_cache_on_flush='false' AND
  default_time_to_live=0 AND
  speculative_retry='99.0PERCENTILE' AND
  memtable_flush_period_in_ms=0 AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'sstable_compression': 'LZ4Compressor'};

We're using wide rows for the "bounces" table that can store hundreds of
thousands of addresses for each "domainid" (in practice it's much less
usually, but some rows may contain up to several million columns).

All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are
deleted from the table using the following CQL3 statement:

delete from bounces where domainid = 'domain.com' and address = '
alice@example.com';

But the thing is, after "repair" is run on any node that owns "domain.com"
key, the column gets resurrected on all nodes as if the tombstone has
disappeared. We checked this multiple times using cqlsh: issue a delete
statement and verify that data is not returned; then run "repair" and the
deleted data is returned again.

Our gc_grace_seconds is of the default value and no nodes ever were down
for anywhere close to 10 days, so it doesn't look like it's related. We
also made sure all our servers are running ntpd so time synchronization
should not be an issue as well.

Have you guys ever seen anything like this / have any idea as to what may
be causing this behavior? What could make "tombstone" disappear during
"repair" operation?

Thanks for your help. Let me know if I can provide more information.

Roman

Re: Deleted columns reappear after "repair"

Posted by Roman Tkachenko <ro...@mailgunhq.com>.

Yep, good point: https://issues.apache.org/jira/browse/CASSANDRA-9045.

On Thu, Mar 26, 2015 at 4:23 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Mar 25, 2015 at 6:53 PM, Roman Tkachenko <ro...@mailgunhq.com>
> wrote:
>
>> Yup, I increased "in_memory_compaction_limit_in_mb" to 512MB so the row
>> in question fits into it and ran repair on a couple of nodes owning its
>> key. The log entries about this particular row went away and those columns
>> haven't reappeared, yet. If that was the reason, that's unfortunate cause
>> we have rows much larger than 512MB and it'd effectively mean nothing can
>> be deleted from them... Can't increase this parameter forever.
>>
>> I'm gonna go ahead and file a report at JIRA.
>>
>
> It would be greatly appreciated by future searchers if you inform the
> thread of the JIRA url assigned this issue, when you file it. :D
>
> =Rob
>
>

Re: Deleted columns reappear after "repair"

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Mar 25, 2015 at 6:53 PM, Roman Tkachenko <ro...@mailgunhq.com>
wrote:

> Yup, I increased "in_memory_compaction_limit_in_mb" to 512MB so the row in
> question fits into it and ran repair on a couple of nodes owning its key.
> The log entries about this particular row went away and those columns
> haven't reappeared, yet. If that was the reason, that's unfortunate cause
> we have rows much larger than 512MB and it'd effectively mean nothing can
> be deleted from them... Can't increase this parameter forever.
>
> I'm gonna go ahead and file a report at JIRA.
>

It would be greatly appreciated by future searchers if you inform the
thread of the JIRA url assigned this issue, when you file it. :D

=Rob

Re: Deleted columns reappear after "repair"

Posted by Roman Tkachenko <ro...@mailgunhq.com>.

Thanks Robert.

Yup, I increased "in_memory_compaction_limit_in_mb" to 512MB so the row in
question fits into it and ran repair on a couple of nodes owning its key.
The log entries about this particular row went away and those columns
haven't reappeared, yet. If that was the reason, that's unfortunate cause
we have rows much larger than 512MB and it'd effectively mean nothing can
be deleted from them... Can't increase this parameter forever.

I'm gonna go ahead and file a report at JIRA.

Roman

On Wed, Mar 25, 2015 at 4:11 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Mar 25, 2015 at 1:57 PM, Roman Tkachenko <ro...@mailgunhq.com>
> wrote:
>
>> Okay, so I'm positively going crazy :)
>>
>> Increasing gc_grace + repair + decreasing gc_grace didn't help. The
>> columns still appear after the repair. I checked in cassandra-cli and
>> timestamps for these columns are old, not in the future, so it shouldn't be
>> the reason.
>>
>> I also did a test: updated one of columns and it was indeed updated. Then
>> deleted it (and it was deleted), ran repair and its "updated" version
>> reappeared again! Why wouldn't these columns just go away? Is there any way
>> I can force their deletion permanently?
>>
>
> It sounds like you have done enough sanity checking of your use of
> Cassandra to consider filing this issue as a JIRA on the issues.apache.org
> JIRA.
>
> The fact that it seems to only affect a row that is being compacted
> incrementally is an interesting datapoint...
>
> =Rob
>
>

Re: Deleted columns reappear after "repair"

Posted by Robert Coli <rc...@eventbrite.com>.

On Wed, Mar 25, 2015 at 1:57 PM, Roman Tkachenko <ro...@mailgunhq.com>
wrote:

> Okay, so I'm positively going crazy :)
>
> Increasing gc_grace + repair + decreasing gc_grace didn't help. The
> columns still appear after the repair. I checked in cassandra-cli and
> timestamps for these columns are old, not in the future, so it shouldn't be
> the reason.
>
> I also did a test: updated one of columns and it was indeed updated. Then
> deleted it (and it was deleted), ran repair and its "updated" version
> reappeared again! Why wouldn't these columns just go away? Is there any way
> I can force their deletion permanently?
>

It sounds like you have done enough sanity checking of your use of
Cassandra to consider filing this issue as a JIRA on the issues.apache.org
JIRA.

The fact that it seems to only affect a row that is being compacted
incrementally is an interesting datapoint...

=Rob

Re: Deleted columns reappear after "repair"

Posted by Roman Tkachenko <ro...@mailgunhq.com>.

Okay, so I'm positively going crazy :)

Increasing gc_grace + repair + decreasing gc_grace didn't help. The columns
still appear after the repair. I checked in cassandra-cli and timestamps
for these columns are old, not in the future, so it shouldn't be the reason.

I also did a test: updated one of columns and it was indeed updated. Then
deleted it (and it was deleted), ran repair and its "updated" version
reappeared again! Why wouldn't these columns just go away? Is there any way
I can force their deletion permanently?

I also see this log entry on the node I'm running repair on, it mentions
the row that contains the reappearing columns:

INFO [ValidationExecutor:77] 2015-03-25 20:27:43,936
CompactionController.java (line 192) Compacting large row
blackbook/bounces:4ed558feba8a483733001d6a (279067683 bytes) incrementally

Can it be related to the issue?


On Tue, Mar 24, 2015 at 11:00 AM, Roman Tkachenko <ro...@mailgunhq.com>
wrote:

> Well, as I mentioned in my original email all machines running Cassandra
> are running NTP. This was one of the first things I verified and I triple
> checked that they all show the same time. Is this sufficient to ensure
> clocks are synched between the nodes?
>
> I have increased gc_grace to 100 days for now and am running repair on the
> affected keyspace, it should be done today. In the meanwhile if you (or
> anyone else) have other ideas / suggestions on how to debug this, they're
> much appreciated.
>
> Thanks for your help!
>
> Roman
>
> On Tue, Mar 24, 2015 at 10:39 AM, Duncan Sands <du...@gmail.com>
> wrote:
>
>> Hi Roman,
>>
>> On 24/03/15 18:05, Roman Tkachenko wrote:
>>
>>> Hi Duncan,
>>>
>>> Thanks for the response!
>>>
>>> I can try increasing gc_grace_seconds and run repair on all nodes. It
>>> does not
>>> make sense though why all *new* deletes (for the same column that
>>> resurrects
>>> after repair) I do are forgotten as well after repair? Doesn't Cassandra
>>> insert
>>> a new tombstone every time delete happens?
>>>
>>
>> it does.  Maybe the data you are trying to delete has a timestamp
>> (writetime) in the future, for example because clocks aren't synchronized
>> between your nodes.
>>
>>
>>> Also, how do I find out the value to set gc_grace_seconds to?
>>>
>>
>> It needs to be big enough that you are sure to repair your entire cluster
>> in less than that time.  For example, observe how long repairing the entire
>> cluster takes and multiply by 3 or 4 (in case a repair fails or is
>> interrupted one day).
>>
>> Once incremental repair is solid maybe the whole gc_grace thing will
>> eventually go away, eg by modifying C* to only drop known repaired
>> tombstones.
>>
>> Ciao, Duncan.
>>
>>
>>> Thanks.
>>>
>>> On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <duncan.sands@gmail.com
>>> <ma...@gmail.com>> wrote:
>>>
>>>     Hi Roman,
>>>
>>>     On 24/03/15 17:32, Roman Tkachenko wrote:
>>>
>>>         Hey guys,
>>>
>>>         Has anyone seen anything like this behavior or has an
>>> explanation for it? If
>>>         not, I think I'm gonna file a bug report.
>>>
>>>
>>>     this can happen if repair is run after the tombstone gc_grace_period
>>> has
>>>     expired.  I suggest you increase gc_grace_period.
>>>
>>>     Ciao, Duncan.
>>>
>>>
>>>
>>
>

Re: Deleted columns reappear after "repair"

Posted by Roman Tkachenko <ro...@mailgunhq.com>.

Well, as I mentioned in my original email all machines running Cassandra
are running NTP. This was one of the first things I verified and I triple
checked that they all show the same time. Is this sufficient to ensure
clocks are synched between the nodes?

I have increased gc_grace to 100 days for now and am running repair on the
affected keyspace, it should be done today. In the meanwhile if you (or
anyone else) have other ideas / suggestions on how to debug this, they're
much appreciated.

Thanks for your help!

Roman

On Tue, Mar 24, 2015 at 10:39 AM, Duncan Sands <du...@gmail.com>
wrote:

> Hi Roman,
>
> On 24/03/15 18:05, Roman Tkachenko wrote:
>
>> Hi Duncan,
>>
>> Thanks for the response!
>>
>> I can try increasing gc_grace_seconds and run repair on all nodes. It
>> does not
>> make sense though why all *new* deletes (for the same column that
>> resurrects
>> after repair) I do are forgotten as well after repair? Doesn't Cassandra
>> insert
>> a new tombstone every time delete happens?
>>
>
> it does.  Maybe the data you are trying to delete has a timestamp
> (writetime) in the future, for example because clocks aren't synchronized
> between your nodes.
>
>
>> Also, how do I find out the value to set gc_grace_seconds to?
>>
>
> It needs to be big enough that you are sure to repair your entire cluster
> in less than that time.  For example, observe how long repairing the entire
> cluster takes and multiply by 3 or 4 (in case a repair fails or is
> interrupted one day).
>
> Once incremental repair is solid maybe the whole gc_grace thing will
> eventually go away, eg by modifying C* to only drop known repaired
> tombstones.
>
> Ciao, Duncan.
>
>
>> Thanks.
>>
>> On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <duncan.sands@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Hi Roman,
>>
>>     On 24/03/15 17:32, Roman Tkachenko wrote:
>>
>>         Hey guys,
>>
>>         Has anyone seen anything like this behavior or has an explanation
>> for it? If
>>         not, I think I'm gonna file a bug report.
>>
>>
>>     this can happen if repair is run after the tombstone gc_grace_period
>> has
>>     expired.  I suggest you increase gc_grace_period.
>>
>>     Ciao, Duncan.
>>
>>
>>
>

Re: Deleted columns reappear after "repair"

Posted by Duncan Sands <du...@gmail.com>.

Hi Roman,

On 24/03/15 18:05, Roman Tkachenko wrote:
> Hi Duncan,
>
> Thanks for the response!
>
> I can try increasing gc_grace_seconds and run repair on all nodes. It does not
> make sense though why all *new* deletes (for the same column that resurrects
> after repair) I do are forgotten as well after repair? Doesn't Cassandra insert
> a new tombstone every time delete happens?

it does.  Maybe the data you are trying to delete has a timestamp (writetime) in 
the future, for example because clocks aren't synchronized between your nodes.

>
> Also, how do I find out the value to set gc_grace_seconds to?

It needs to be big enough that you are sure to repair your entire cluster in 
less than that time.  For example, observe how long repairing the entire cluster 
takes and multiply by 3 or 4 (in case a repair fails or is interrupted one day).

Once incremental repair is solid maybe the whole gc_grace thing will eventually 
go away, eg by modifying C* to only drop known repaired tombstones.

Ciao, Duncan.

>
> Thanks.
>
> On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <duncan.sands@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Hi Roman,
>
>     On 24/03/15 17:32, Roman Tkachenko wrote:
>
>         Hey guys,
>
>         Has anyone seen anything like this behavior or has an explanation for it? If
>         not, I think I'm gonna file a bug report.
>
>
>     this can happen if repair is run after the tombstone gc_grace_period has
>     expired.  I suggest you increase gc_grace_period.
>
>     Ciao, Duncan.
>
>

Re: Deleted columns reappear after "repair"

Posted by Roman Tkachenko <ro...@mailgunhq.com>.

Hi Duncan,

Thanks for the response!

I can try increasing gc_grace_seconds and run repair on all nodes. It does
not make sense though why all *new* deletes (for the same column that
resurrects after repair) I do are forgotten as well after repair? Doesn't
Cassandra insert a new tombstone every time delete happens?

Also, how do I find out the value to set gc_grace_seconds to?

Thanks.

On Tue, Mar 24, 2015 at 9:38 AM, Duncan Sands <du...@gmail.com>
wrote:

> Hi Roman,
>
> On 24/03/15 17:32, Roman Tkachenko wrote:
>
>> Hey guys,
>>
>> Has anyone seen anything like this behavior or has an explanation for it?
>> If
>> not, I think I'm gonna file a bug report.
>>
>
> this can happen if repair is run after the tombstone gc_grace_period has
> expired.  I suggest you increase gc_grace_period.
>
> Ciao, Duncan.
>

Re: Deleted columns reappear after "repair"

Posted by Duncan Sands <du...@gmail.com>.

Hi Roman,

On 24/03/15 17:32, Roman Tkachenko wrote:
> Hey guys,
>
> Has anyone seen anything like this behavior or has an explanation for it? If
> not, I think I'm gonna file a bug report.

this can happen if repair is run after the tombstone gc_grace_period has 
expired.  I suggest you increase gc_grace_period.

Ciao, Duncan.

Re: Deleted columns reappear after "repair"

Posted by Roman Tkachenko <ro...@mailgunhq.com>.

Hey guys,

Has anyone seen anything like this behavior or has an explanation for it?
If not, I think I'm gonna file a bug report.

Thanks!

Roman

On Mon, Mar 23, 2015 at 4:45 PM, Roman Tkachenko <ro...@mailgunhq.com>
wrote:

> Hey guys,
>
> We're having a very strange issue: deleted columns get resurrected when
> "repair" is run on a node.
>
> Info about the setup. Cassandra 2.0.13, multi datacenter with 12 nodes in
> one datacenter and 6 nodes in another one. Schema:
>
> cqlsh> describe keyspace blackbook;
>
> CREATE KEYSPACE blackbook WITH replication = {
>   'class': 'NetworkTopologyStrategy',
>   'IAD': '3',
>   'ORD': '3'
> };
>
> USE blackbook;
>
> CREATE TABLE bounces (
>   domainid text,
>   address text,
>   message text,
>   "timestamp" bigint,
>   PRIMARY KEY (domainid, address)
> ) WITH
>   bloom_filter_fp_chance=0.100000 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.100000 AND
>   gc_grace_seconds=864000 AND
>   index_interval=128 AND
>   read_repair_chance=0.000000 AND
>   populate_io_cache_on_flush='false' AND
>   default_time_to_live=0 AND
>   speculative_retry='99.0PERCENTILE' AND
>   memtable_flush_period_in_ms=0 AND
>   compaction={'class': 'LeveledCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
>
> We're using wide rows for the "bounces" table that can store hundreds of
> thousands of addresses for each "domainid" (in practice it's much less
> usually, but some rows may contain up to several million columns).
>
> All queries are done using LOCAL_QUORUM consistency. Sometimes bounces are
> deleted from the table using the following CQL3 statement:
>
> delete from bounces where domainid = 'domain.com' and address = '
> alice@example.com';
>
> But the thing is, after "repair" is run on any node that owns "domain.com"
> key, the column gets resurrected on all nodes as if the tombstone has
> disappeared. We checked this multiple times using cqlsh: issue a delete
> statement and verify that data is not returned; then run "repair" and the
> deleted data is returned again.
>
> Our gc_grace_seconds is of the default value and no nodes ever were down
> for anywhere close to 10 days, so it doesn't look like it's related. We
> also made sure all our servers are running ntpd so time synchronization
> should not be an issue as well.
>
> Have you guys ever seen anything like this / have any idea as to what may
> be causing this behavior? What could make "tombstone" disappear during
> "repair" operation?
>
> Thanks for your help. Let me know if I can provide more information.
>
> Roman
>