You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Per Otterström (JIRA)" <ji...@apache.org> on 2019/06/18 12:04:00 UTC

[jira] [Comment Edited] (CASSANDRA-15159) Purgable tombstones are ignored by read query

    [ https://issues.apache.org/jira/browse/CASSANDRA-15159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16866546#comment-16866546 ] 

Per Otterström edited comment on CASSANDRA-15159 at 6/18/19 12:03 PM:
----------------------------------------------------------------------

I had a look at this ticket [~laxmikant99].

I'm able to reproduce with your instructions on {{trunk}} as well. It's nothing related to TTL, so a simple delete will give the same result.

From what I can tell, this behavior comes from {{ReadCommand.withoutPurgeableTombstones()}}. Given the title of this ticket, I'm guessing you arrived at the same conclusion. :-)

That method has the following comment:
{code:java}
// Skip purgeable tombstones. We do this because it's safe to do (post-merge of the memtable and sstable at least), it
// can save us some bandwith, and avoid making us throw a TombstoneOverwhelmingException for purgeable tombstones (which
// are to some extend an artefact of compaction lagging behind and hence counting them is somewhat unintuitive).
{code}

So, this seem very intentional. Reasons I can think of for this behavior are:
* The query result is consistent whether compaction happened or not - after all, gc grace has expired
* And as mentioned in the comment; overwhelming amount of tombstones and bandwidth

At the same time, it is annoying to get back resurrected data, when you *know* the tombstone is still there.

But what worries me the most is that a read-repair will be triggered on every read and spread the resurrected data.

{noformat}
cqlsh> select * from test.table1 where key = 'key2';

 key  | col | val
------+-----+-----
 key2 | abc | xyz

(1 rows)

Tracing session: 4f1a9b90-91b6-11e9-a0ff-9d318f160025

 activity                                                                                                                      | timestamp                  | source    | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                            Execute CQL3 query | 2019-06-18 12:46:20.873000 | 127.0.0.1 |              0 | 127.0.0.1
                                           Parsing select * from test.table1 where key = 'key2'; [Native-Transport-Requests-1] | 2019-06-18 12:46:20.874000 | 127.0.0.1 |            358 | 127.0.0.1
                                                                             Preparing statement [Native-Transport-Requests-1] | 2019-06-18 12:46:20.875000 | 127.0.0.1 |           1331 | 127.0.0.1
                                                              reading digest from 127.0.0.3:7000 [Native-Transport-Requests-1] | 2019-06-18 12:46:20.875000 | 127.0.0.1 |           1922 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2048 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2166 | 127.0.0.1
                                                          Sending READ_REQ message to 127.0.0.3:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2212 | 127.0.0.1
                                                                                     Key cache hit for sstable 2 [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2376 | 127.0.0.1
                                       Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2542 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2772 | 127.0.0.1
                                                                          Read 0 live rows and 0 tombstone cells [ReadStage-2] | 2019-06-18 12:46:20.876001 | 127.0.0.1 |           2899 | 127.0.0.1
                                                       READ_REQ message received from 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.877000 | 127.0.0.3 |             73 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1148 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1401 | 127.0.0.1
                                       Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1637 | 127.0.0.1
                                                                                     Key cache hit for sstable 1 [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1917 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-1] | 2019-06-18 12:46:20.879000 | 127.0.0.3 |           2442 | 127.0.0.1
                                                                          Read 1 live rows and 0 tombstone cells [ReadStage-1] | 2019-06-18 12:46:20.879000 | 127.0.0.3 |           2654 | 127.0.0.1
                                                                            Enqueuing response to 127.0.0.1:7000 [ReadStage-1] | 2019-06-18 12:46:20.879000 | 127.0.0.3 |           2851 | 127.0.0.1
                                                          Sending READ_RSP message to 127.0.0.1:7000 [Messaging-EventLoop-3-3] | 2019-06-18 12:46:20.880000 | 127.0.0.3 |           3276 | 127.0.0.1
                                                       READ_RSP message received from 127.0.0.3:7000 [Messaging-EventLoop-3-2] | 2019-06-18 12:46:20.881000 | 127.0.0.1 |           7419 | 127.0.0.1
                                                              Processing response from 127.0.0.3:7000 [RequestResponseStage-5] | 2019-06-18 12:46:20.882000 | 127.0.0.1 |           8800 | 127.0.0.1
                   Digest mismatch: Mismatch for key DecoratedKey(8482869187405483569, 6b657932) [Native-Transport-Requests-1] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9551 | 127.0.0.1
     Enqueuing full data read to Full(127.0.0.3:7000,(3074457345618258602,-9223372036854775808]) [Native-Transport-Requests-1] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9771 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-6] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9800 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-6] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9880 | 127.0.0.1
                                                                                     Key cache hit for sstable 2 [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10095 | 127.0.0.1
                                                          Sending READ_REQ message to 127.0.0.3:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10121 | 127.0.0.1
                                       Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10235 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10463 | 127.0.0.1
                                                                          Read 0 live rows and 0 tombstone cells [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10598 | 127.0.0.1
                                                       READ_REQ message received from 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.885000 | 127.0.0.3 |            105 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-2] | 2019-06-18 12:46:20.885000 | 127.0.0.3 |            665 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |            867 | 127.0.0.1
                                       Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |           1067 | 127.0.0.1
                                                                                     Key cache hit for sstable 1 [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |           1349 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |           1731 | 127.0.0.1
                                                                          Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2019-06-18 12:46:20.887000 | 127.0.0.3 |           1930 | 127.0.0.1
                                                                            Enqueuing response to 127.0.0.1:7000 [ReadStage-2] | 2019-06-18 12:46:20.887000 | 127.0.0.3 |           2634 | 127.0.0.1
                                                          Sending READ_RSP message to 127.0.0.1:7000 [Messaging-EventLoop-3-3] | 2019-06-18 12:46:20.889000 | 127.0.0.3 |           3510 | 127.0.0.1
                                                       READ_RSP message received from 127.0.0.3:7000 [Messaging-EventLoop-3-2] | 2019-06-18 12:46:20.889000 | 127.0.0.1 |          15665 | 127.0.0.1
                                                              Processing response from 127.0.0.3:7000 [RequestResponseStage-6] | 2019-06-18 12:46:20.890000 | 127.0.0.1 |          16028 | 127.0.0.1
 Sending read-repair-mutation to Full(127.0.0.1:7000,(3074457345618258602,-9223372036854775808]) [Native-Transport-Requests-1] | 2019-06-18 12:46:20.890000 | 127.0.0.1 |          16762 | 127.0.0.1
                                                   Sending READ_REPAIR_REQ message to 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.891000 | 127.0.0.1 |          17413 | 127.0.0.1
                                                READ_REPAIR_REQ message received from 127.0.0.1:7000 [Messaging-EventLoop-3-1] | 2019-06-18 12:46:20.891000 | 127.0.0.1 |          17839 | 127.0.0.1
                                                                                      Appending to commitlog [MutationStage-6] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18064 | 127.0.0.1
                                                                                   Adding to table1 memtable [MutationStage-6] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18219 | 127.0.0.1
                                                   Sending READ_REPAIR_RSP message to 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18494 | 127.0.0.1
                                                READ_REPAIR_RSP message received from 127.0.0.1:7000 [Messaging-EventLoop-3-1] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18767 | 127.0.0.1
                                                              Processing response from 127.0.0.1:7000 [RequestResponseStage-5] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18937 | 127.0.0.1
                                                                                                              Request complete | 2019-06-18 12:46:20.892148 | 127.0.0.1 |          19148 | 127.0.0.1
{noformat}

This in turn will probably delay the collection of data/tombstone as data will keep spreading into new sstables which in turn result in more read-repairs... Perhaps it would make more sense if the purgable tombstone was transferred on read-repair, in which case it probably should have been included in the initial response.

There may be more side effects that I'm overlooking. Would be great if someone with more experience could share their view on this.


was (Author: eperott):
I had a look at this ticket [~laxmikant99].

I'm able to reproduce with your instructions on {{trunk}} as well. It's nothing related to TTL, so a simple delete will give the same result.

From what I can tell, this behavior comes from {{ReadCommand.withoutPurgeableTombstones()}}. Given the title of this ticket, I'm guessing you arrived at the same conclusion. :-)

That method has the following comment:
{code:java}
// Skip purgeable tombstones. We do this because it's safe to do (post-merge of the memtable and sstable at least), it
// can save us some bandwith, and avoid making us throw a TombstoneOverwhelmingException for purgeable tombstones (which
// are to some extend an artefact of compaction lagging behind and hence counting them is somewhat unintuitive).
{code}

So, this seem very intentional. Reasons I can think of for this behavior are:
* The query result is consistent whether compaction happened or not - after all, gc grace has expired
* And as mentioned in the comment; overwhelming amount of tombstones and bandwidth

At the same time, it is annoying to get back resurrected data, when you *know* the tombstone is still there.

But what worries me the most is that a read-repair will be triggered on every read and spread the resurrected data.

{noformat}
cqlsh> select * from test.table1 where key = 'key2';

 key  | col | val
------+-----+-----
 key2 | abc | xyz

(1 rows)

Tracing session: 4f1a9b90-91b6-11e9-a0ff-9d318f160025

 activity                                                                                                                      | timestamp                  | source    | source_elapsed | client
-------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------+-----------
                                                                                                            Execute CQL3 query | 2019-06-18 12:46:20.873000 | 127.0.0.1 |              0 | 127.0.0.1
                                           Parsing select * from test.table1 where key = 'key2'; [Native-Transport-Requests-1] | 2019-06-18 12:46:20.874000 | 127.0.0.1 |            358 | 127.0.0.1
                                                                             Preparing statement [Native-Transport-Requests-1] | 2019-06-18 12:46:20.875000 | 127.0.0.1 |           1331 | 127.0.0.1
                                                              reading digest from 127.0.0.3:7000 [Native-Transport-Requests-1] | 2019-06-18 12:46:20.875000 | 127.0.0.1 |           1922 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2048 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2166 | 127.0.0.1
                                                          Sending READ_REQ message to 127.0.0.3:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2212 | 127.0.0.1
                                                                                     Key cache hit for sstable 2 [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2376 | 127.0.0.1
                                       Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2542 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-2] | 2019-06-18 12:46:20.876000 | 127.0.0.1 |           2772 | 127.0.0.1
                                                                          Read 0 live rows and 0 tombstone cells [ReadStage-2] | 2019-06-18 12:46:20.876001 | 127.0.0.1 |           2899 | 127.0.0.1
                                                       READ_REQ message received from 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.877000 | 127.0.0.3 |             73 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1148 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1401 | 127.0.0.1
                                       Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1637 | 127.0.0.1
                                                                                     Key cache hit for sstable 1 [ReadStage-1] | 2019-06-18 12:46:20.878000 | 127.0.0.3 |           1917 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-1] | 2019-06-18 12:46:20.879000 | 127.0.0.3 |           2442 | 127.0.0.1
                                                                          Read 1 live rows and 0 tombstone cells [ReadStage-1] | 2019-06-18 12:46:20.879000 | 127.0.0.3 |           2654 | 127.0.0.1
                                                                            Enqueuing response to 127.0.0.1:7000 [ReadStage-1] | 2019-06-18 12:46:20.879000 | 127.0.0.3 |           2851 | 127.0.0.1
                                                          Sending READ_RSP message to 127.0.0.1:7000 [Messaging-EventLoop-3-3] | 2019-06-18 12:46:20.880000 | 127.0.0.3 |           3276 | 127.0.0.1
                                                       READ_RSP message received from 127.0.0.3:7000 [Messaging-EventLoop-3-2] | 2019-06-18 12:46:20.881000 | 127.0.0.1 |           7419 | 127.0.0.1
                                                              Processing response from 127.0.0.3:7000 [RequestResponseStage-5] | 2019-06-18 12:46:20.882000 | 127.0.0.1 |           8800 | 127.0.0.1
                   Digest mismatch: Mismatch for key DecoratedKey(8482869187405483569, 6b657932) [Native-Transport-Requests-1] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9551 | 127.0.0.1
     Enqueuing full data read to Full(127.0.0.3:7000,(3074457345618258602,-9223372036854775808]) [Native-Transport-Requests-1] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9771 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-6] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9800 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-6] | 2019-06-18 12:46:20.883000 | 127.0.0.1 |           9880 | 127.0.0.1
                                                                                     Key cache hit for sstable 2 [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10095 | 127.0.0.1
                                                          Sending READ_REQ message to 127.0.0.3:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10121 | 127.0.0.1
                                       Skipped 0/2 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10235 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10463 | 127.0.0.1
                                                                          Read 0 live rows and 0 tombstone cells [ReadStage-6] | 2019-06-18 12:46:20.884000 | 127.0.0.1 |          10598 | 127.0.0.1
                                                       READ_REQ message received from 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.885000 | 127.0.0.3 |            105 | 127.0.0.1
                                                                      Executing single-partition query on table1 [ReadStage-2] | 2019-06-18 12:46:20.885000 | 127.0.0.3 |            665 | 127.0.0.1
                                                                                    Acquiring sstable references [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |            867 | 127.0.0.1
                                       Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |           1067 | 127.0.0.1
                                                                                     Key cache hit for sstable 1 [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |           1349 | 127.0.0.1
                                                                       Merged data from memtables and 1 sstables [ReadStage-2] | 2019-06-18 12:46:20.886000 | 127.0.0.3 |           1731 | 127.0.0.1
                                                                          Read 1 live rows and 0 tombstone cells [ReadStage-2] | 2019-06-18 12:46:20.887000 | 127.0.0.3 |           1930 | 127.0.0.1
                                                                            Enqueuing response to 127.0.0.1:7000 [ReadStage-2] | 2019-06-18 12:46:20.887000 | 127.0.0.3 |           2634 | 127.0.0.1
                                                          Sending READ_RSP message to 127.0.0.1:7000 [Messaging-EventLoop-3-3] | 2019-06-18 12:46:20.889000 | 127.0.0.3 |           3510 | 127.0.0.1
                                                       READ_RSP message received from 127.0.0.3:7000 [Messaging-EventLoop-3-2] | 2019-06-18 12:46:20.889000 | 127.0.0.1 |          15665 | 127.0.0.1
                                                              Processing response from 127.0.0.3:7000 [RequestResponseStage-6] | 2019-06-18 12:46:20.890000 | 127.0.0.1 |          16028 | 127.0.0.1
 Sending read-repair-mutation to Full(127.0.0.1:7000,(3074457345618258602,-9223372036854775808]) [Native-Transport-Requests-1] | 2019-06-18 12:46:20.890000 | 127.0.0.1 |          16762 | 127.0.0.1
                                                   Sending READ_REPAIR_REQ message to 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.891000 | 127.0.0.1 |          17413 | 127.0.0.1
                                                READ_REPAIR_REQ message received from 127.0.0.1:7000 [Messaging-EventLoop-3-1] | 2019-06-18 12:46:20.891000 | 127.0.0.1 |          17839 | 127.0.0.1
                                                                                      Appending to commitlog [MutationStage-6] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18064 | 127.0.0.1
                                                                                   Adding to table1 memtable [MutationStage-6] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18219 | 127.0.0.1
                                                   Sending READ_REPAIR_RSP message to 127.0.0.1:7000 [Messaging-EventLoop-3-4] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18494 | 127.0.0.1
                                                READ_REPAIR_RSP message received from 127.0.0.1:7000 [Messaging-EventLoop-3-1] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18767 | 127.0.0.1
                                                              Processing response from 127.0.0.1:7000 [RequestResponseStage-5] | 2019-06-18 12:46:20.892000 | 127.0.0.1 |          18937 | 127.0.0.1
                                                                                                              Request complete | 2019-06-18 12:46:20.892148 | 127.0.0.1 |          19148 | 127.0.0.1
{noformat}

This in turn will probably delay the collection of data/tombstone as data will keep spreading into new sstables which in turn result in more read-repairs... Perhaps it would make more sense if the purgable tombstone was transferred on repair, in which case it probably should have been included in the initial response.

There may be more side effects that I'm overlooking. Would be great if someone with more experience could share their view on this.

> Purgable tombstones are ignored by read query
> ---------------------------------------------
>
>                 Key: CASSANDRA-15159
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15159
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Legacy/Local Write-Read Paths
>            Reporter: Laxmikant Upadhyay
>            Priority: Normal
>         Attachments: steps2reproduce.txt, trace.txt
>
>
> In 3 node Cassandra cluster, one node has old mutation and two nodes have evict-able (crossed gc grace period) tombstone produced by TTL. A read query with local quorum return the old mutation as result when one request goes to the node having old mutation. However expected result should be empty. 
> Attached the steps to reproduce and trace of read query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org