You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Benedict Elliott Smith (Jira)" <ji...@apache.org> on 2019/10/18 12:39:00 UTC

[jira] [Commented] (CASSANDRA-15363) Read repair in mixed mode between 2.1 and 3.0 on COMPACT STORAGE tables causes unreadable sstables after upgrade

    [ https://issues.apache.org/jira/browse/CASSANDRA-15363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16954559#comment-16954559 ] 

Benedict Elliott Smith commented on CASSANDRA-15363:
----------------------------------------------------

To expand upon this for those readers who aren't quite sure what is happening:

* These fake row tombstones are generated when serving point or {{IN}} clause queries, instead of RT bounds for each slice for sstable reads
* For compact storage tables without a clustering key, {{SELECT}} on a subset of columns gets translated into an {{IN}} clause, since cells are stored as rows; this causes us to generate a "row" deletion for each selected column in a partition, whether or not there was any data for it
* These may then be read-repaired to other nodes _as_ row range tombstones, through {{LegacyLayout.fromRow}} translation, which make no sense for compact storage tables
* After upgrade the 3.0 nodes attempt to read the data that was written to them during read-repair as a 2.1 node, and {{LegacyLayout}} complains that these range tombstones occurring within a static row make no sense (except for really weird and implausible thrift use cases we've never encountered, so never implemented)

The fix resolves this by ensuring this can never occur in the "static" compact storage case, which simply means a compact storage table with no clustering key.  This works because we cannot have a range tombstone deletion of these rows; they can only be deleted through cell deletes (via setting the column to null) or by partition-level deletes.  So we can always expect that {{activeDeletion == partitionLevelDeletion}} in this case.  Since it is anyway unnecessary to generate these tombstones in this case, we get both a minor optimisation for all use cases, and avoid this problem.

> Read repair in mixed mode between 2.1 and 3.0 on COMPACT STORAGE tables causes unreadable sstables after upgrade
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15363
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15363
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Coordination
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x
>
>
> if we have a table like this:
> {{CREATE TABLE tbl (pk ascii, b boolean, v blob, PRIMARY KEY (pk)) WITH COMPACT STORAGE}}
> with a cluster where node1 is 2.1 and node2 is 3.0 (during upgrade):
> * node2 coordinates a delete {{DELETE FROM tbl WHERE pk = 'something'}} which node1 does not get
> * node1 coordinates a quorum read {{SELECT * FROM tbl WHERE id = 'something'}} which causes a read repair
> * this makes node1 flush an sstable like this:
> {code}
> [
> {"key": "something",
>  "metadata": {"deletionInfo": {"markedForDeleteAt":1571388944364000,"localDeletionTime":1571388944}},
>  "cells": [["b","b",1571388944364000,"t",1571388944],
>            ["v","v",1571388944364000,"t",1571388944]]}
> ]
> {code}
> (It has range tombstones which are covered by the partition deletion)
> Then, when we upgrade this node to 3.0 and try to read or run upgradesstables, we get this:
> {code}
> ERROR [node1_CompactionExecutor:1] node1 2019-10-18 10:44:11,325 DebuggableThreadPoolExecutor.java:242 - Error in ThreadPoolExecutor
> java.lang.UnsupportedOperationException: null
> 	at org.apache.cassandra.db.LegacyLayout.extractStaticColumns(LegacyLayout.java:779) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:120) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:57) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:362) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:103) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:94) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.utils.MergeIterator$OneToOne.computeNext(MergeIterator.java:442) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:144) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:92) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:227) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:190) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:89) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:675) ~[dtest-3.0.19.jar:na]
> 	at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[dtest-3.0.19.jar:na]
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_121]
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_121]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_121]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_121]
> 	at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83) [dtest-3.0.19.jar:na]
> 	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org