You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by sl...@apache.org on 2020/05/27 15:19:29 UTC

[cassandra] branch cassandra-3.0 updated (a4b6deb -> c8a2834)

This is an automated email from the ASF dual-hosted git repository.

slebresne pushed a change to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


    from a4b6deb  Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid races
     new 8358e19  Fix legacy handling of RangeTombstone with collection ones
     new c8a2834  Fix LegacyLayout handling of non-selected collection tombstones

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 CHANGES.txt                                        |   1 +
 src/java/org/apache/cassandra/db/LegacyLayout.java | 105 +++++++++++++----
 .../cassandra/db/UnfilteredDeserializer.java       | 129 ++++++++++++++++-----
 .../upgrade/MixedModeRangeTombstoneTest.java       |  73 ++++++++++++
 .../org/apache/cassandra/db/LegacyLayoutTest.java  | 102 +++++++++++++---
 5 files changed, 340 insertions(+), 70 deletions(-)
 create mode 100644 test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeRangeTombstoneTest.java


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org

[cassandra] 01/02: Fix legacy handling of RangeTombstone with collection ones

Posted by sl...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

slebresne pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 8358e19840d352475a5831d130ff3c43a11f2f4e
Author: Sylvain Lebresne <le...@gmail.com>
AuthorDate: Fri May 8 18:12:55 2020 +0200

    Fix legacy handling of RangeTombstone with collection ones
    
    When a multi-row range tombstone interacts with a a collection tombstone
    within one of a covered row, the resulting range tombstone in the legacy
    format will start in the middle of the row and extend past said row and
    it needs special handling.
    
    Before this commit, the code deserializing that RT was making it
    artificially start at the end of the row (in which the collection
    tombstone is), but that means that when `LegacyLayout.CellGrouper`
    encountered it, it decided the row was finished, even if it was not,
    leading to potential row duplication.
    
    The patch solves this by:
    1. making that problematic tombstone start at the beginning of the row
    instead of its end (to avoid code deciding the row is over).
    2. modify `UnfilteredDeserializer` to 'split' that range tombstone into
    a row tombstone for the row it covers, which is handled as a normal row
    tombstone, and push the rest of the range tombstone (that starts after
    the row and extends to the original end of the RT) to be handled after
    that row is fully "grouped".
    
    The patch also removes the possibility of getting an empty row from
    `LegacyLayout#getNextRow` to avoid theoretical problems with that.
    
    Patch by Sylvain Lebresne; reviewed by Marcus Eriksson & Aleksey
    Yeschenko for CASSANDRA-15805
---
 CHANGES.txt                                        |   1 +
 src/java/org/apache/cassandra/db/LegacyLayout.java |  99 ++++++++++++----
 .../cassandra/db/UnfilteredDeserializer.java       | 129 ++++++++++++++++-----
 .../upgrade/MixedModeRangeTombstoneTest.java       |  73 ++++++++++++
 4 files changed, 252 insertions(+), 50 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index cdb9ad0..46b3f56 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 3.0.21
+ * Fix duplicated row on 2.x upgrades when multi-rows range tombstones interact with collection ones (CASSANDRA-15805)
  * Rely on snapshotted session infos on StreamResultFuture.maybeComplete to avoid race conditions (CASSANDRA-15667)
  * EmptyType doesn't override writeValue so could attempt to write bytes when expected not to (CASSANDRA-15790)
  * Fix index queries on partition key columns when some partitions contains only static data (CASSANDRA-13666)
diff --git a/src/java/org/apache/cassandra/db/LegacyLayout.java b/src/java/org/apache/cassandra/db/LegacyLayout.java
index 37cc935..39dd54a 100644
--- a/src/java/org/apache/cassandra/db/LegacyLayout.java
+++ b/src/java/org/apache/cassandra/db/LegacyLayout.java
@@ -1115,7 +1115,7 @@ public abstract class LegacyLayout
         return true;
     }
 
-    private static Comparator<LegacyAtom> legacyAtomComparator(CFMetaData metadata)
+    static Comparator<LegacyAtom> legacyAtomComparator(CFMetaData metadata)
     {
         return (o1, o2) ->
         {
@@ -1373,8 +1373,24 @@ public abstract class LegacyLayout
             this.hasValidCells = false;
         }
 
+        /**
+         * Try adding the provided atom to the currently grouped row.
+         *
+         * @param atom the new atom to try to add. This <b>must</b> be a "row" atom, that is either a cell or a legacy
+         *             range tombstone that covers only one row (row deletion) or a subset of it (collection
+         *             deletion). Meaning that legacy range tombstone covering multiple rows (that should be handled as
+         *             legit range tombstone in the new storage engine) should be handled separately. Atoms should also
+         *             be provided in proper clustering order.
+         * @return {@code true} if the provided atom has been "consumed" by this grouper (this does _not_ mean the
+         *          atom has been "used" by the grouper as the grouper will skip some shadowed atoms for instance, just
+         *          that {@link #getRow()} shouldn't be called just yet if there is more atom in the atom iterator we're
+         *          grouping). {@code false} otherwise, that is if the row currently built by this grouper is done
+         *          _without_ the provided atom being "consumed" (and so {@link #getRow()} should be called and the
+         *          grouper resetted, after which the provided atom should be provided again).
+         */
         public boolean addAtom(LegacyAtom atom)
         {
+            assert atom.isRowAtom(metadata) : "Unexpected non in-row legacy range tombstone " + atom;
             return atom.isCell()
                  ? addCell(atom.asCell())
                  : addRangeTombstone(atom.asRangeTombstone());
@@ -1472,11 +1488,16 @@ public abstract class LegacyLayout
         private boolean addRangeTombstone(LegacyRangeTombstone tombstone)
         {
             if (tombstone.isRowDeletion(metadata))
+            {
                 return addRowTombstone(tombstone);
-            else if (tombstone.isCollectionTombstone())
-                return addCollectionTombstone(tombstone);
+            }
             else
-                return addGenericRangeTombstone(tombstone);
+            {
+                // The isRowAtom() assertion back in addAtom would have already triggered otherwise, but spelling it
+                // out nonetheless.
+                assert tombstone.isCollectionTombstone();
+                return addCollectionTombstone(tombstone);
+            }
         }
 
         private boolean addRowTombstone(LegacyRangeTombstone tombstone)
@@ -1545,24 +1566,32 @@ public abstract class LegacyLayout
             return true;
         }
 
-        private boolean addGenericRangeTombstone(LegacyRangeTombstone tombstone)
+        /**
+         * Whether the provided range tombstone starts strictly after the current row of the cell grouper (if no row is
+         * currently started, this return false).
+         */
+        public boolean startsAfterCurrentRow(LegacyRangeTombstone rangeTombstone)
         {
-            /*
-             * We can see a non-collection, non-row deletion in two scenarios:
-             *
-             * 1. Most commonly, the tombstone's start bound is bigger than current row's clustering, which means that
-             *    the current row is over, and we should move on to the next row or RT;
-             *
-             * 2. Less commonly, the tombstone's start bound is smaller than current row's clustering, which means that
-             *    we've crossed an index boundary and are seeing a non-closed RT from the previous block, repeated;
-             *    we should ignore it and stay in the current row.
-             *
-             *  In either case, clustering should be non-null, or we shouldn't have gotten to this method at all
-             *  However, to be absolutely SURE we're in case two above, we check here.
-             */
-            return clustering != null && metadata.comparator.compare(clustering, tombstone.start.bound.clustering()) > 0;
+            return clustering != null && metadata.comparator.compare(rangeTombstone.start.bound, clustering) > 0;
+        }
+
+        /**
+         * The clustering of the current row of the cell grouper, or {@code null} if no row is currently started.
+         */
+        public Clustering currentRowClustering()
+        {
+            return clustering;
         }
 
+        /**
+         * Generates the row currently grouped by this grouper and reset it for the following row.
+         * <p>
+         * Note that the only correct way to call this is when either all the atom we're trying to group has been
+         * consumed, or when {@link #addAtom(LegacyAtom)} returns {@code false}.
+         *
+         * @return the current row that has been grouped, or {@code null} in the rare case where all the atoms
+         * "consumed" by {@link #addAtom(LegacyAtom)} for this row were skipped (we skip atoms under a few conditions).
+         */
         public Row getRow()
         {
             if (!hasValidCells && invalidLivenessInfo != null)
@@ -1718,6 +1747,12 @@ public abstract class LegacyLayout
 
         public LegacyCell asCell();
         public LegacyRangeTombstone asRangeTombstone();
+
+        /**
+         * Whether the atom is one that becomes part of a {@link Row} in the new storage engine, meaning it is either
+         * as cell or a legacy range tombstone that covers a single row, or parts of one.
+         */
+        public boolean isRowAtom(CFMetaData metadata);
     }
 
     /**
@@ -1835,6 +1870,12 @@ public abstract class LegacyLayout
             throw new UnsupportedOperationException();
         }
 
+        @Override
+        public boolean isRowAtom(CFMetaData metaData)
+        {
+            return true;
+        }
+
         public boolean isCounter()
         {
             return kind == Kind.COUNTER;
@@ -1889,9 +1930,9 @@ public abstract class LegacyLayout
             if ((start.collectionName == null) != (stop.collectionName == null))
             {
                 if (start.collectionName == null)
-                    stop = new LegacyBound(stop.bound, stop.isStatic, null);
+                    stop = new LegacyBound(Slice.Bound.inclusiveEndOf(stop.bound.values), stop.isStatic, null);
                 else
-                    start = new LegacyBound(start.bound, start.isStatic, null);
+                    start = new LegacyBound(Slice.Bound.inclusiveStartOf(start.bound.values), start.isStatic, null);
             }
             else if (!Objects.equals(start.collectionName, stop.collectionName))
             {
@@ -1918,11 +1959,21 @@ public abstract class LegacyLayout
             return new LegacyRangeTombstone(newStart, stop, deletionTime);
         }
 
+        public LegacyRangeTombstone withNewStart(Slice.Bound newStart)
+        {
+            return withNewStart(new LegacyBound(newStart, start.isStatic, null));
+        }
+
         public LegacyRangeTombstone withNewEnd(LegacyBound newStop)
         {
             return new LegacyRangeTombstone(start, newStop, deletionTime);
         }
 
+        public LegacyRangeTombstone withNewEnd(Slice.Bound newEnd)
+        {
+            return withNewEnd(new LegacyBound(newEnd, stop.isStatic, null));
+        }
+
         public boolean isCell()
         {
             return false;
@@ -1943,6 +1994,12 @@ public abstract class LegacyLayout
             return this;
         }
 
+        @Override
+        public boolean isRowAtom(CFMetaData metadata)
+        {
+            return isCollectionTombstone() || isRowDeletion(metadata);
+        }
+
         public boolean isCollectionTombstone()
         {
             return start.collectionName != null;
diff --git a/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java b/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java
index 62ad76a..2d270bc 100644
--- a/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java
+++ b/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java
@@ -480,16 +480,7 @@ public abstract class UnfilteredDeserializer
                 this.helper = helper;
                 this.grouper = new LegacyLayout.CellGrouper(metadata, helper);
                 this.tombstoneTracker = new TombstoneTracker(partitionDeletion);
-                this.atoms = new AtomIterator(atomReader);
-            }
-
-            private boolean isRow(LegacyLayout.LegacyAtom atom)
-            {
-                if (atom.isCell())
-                    return true;
-
-                LegacyLayout.LegacyRangeTombstone tombstone = atom.asRangeTombstone();
-                return tombstone.isCollectionTombstone() || tombstone.isRowDeletion(metadata);
+                this.atoms = new AtomIterator(atomReader, metadata);
             }
 
             public boolean hasNext()
@@ -515,7 +506,7 @@ public abstract class UnfilteredDeserializer
                             if (tombstoneTracker.isShadowed(atom))
                                 continue;
 
-                            if (isRow(atom))
+                            if (atom.isRowAtom(metadata))
                                 next = readRow(atom);
                             else
                                 tombstoneTracker.openNew(atom.asRangeTombstone());
@@ -539,13 +530,57 @@ public abstract class UnfilteredDeserializer
                                                  ? LegacyLayout.CellGrouper.staticGrouper(metadata, helper)
                                                  : this.grouper;
                 grouper.reset();
+                // We know the first atom is not shadowed and is a "row" atom, so can be added blindly.
                 grouper.addAtom(first);
-                // As long as atoms are part of the same row, consume them. Note that the call to addAtom() uses
-                // atoms.peek() so that the atom is only consumed (by next) if it's part of the row (addAtom returns true)
-                while (atoms.hasNext() && grouper.addAtom(atoms.peek()))
+
+                // We're less sure about the next atoms. In particular, CellGrouper want to make sure we only pass it
+                // "row" atoms (it's the only type it knows how to handle) so we should handle anything else.
+                while (atoms.hasNext())
                 {
-                    atoms.next();
+                    // Peek, but don't consume the next atom just yet
+                    LegacyLayout.LegacyAtom atom = atoms.peek();
+                    // First, that atom may be shadowed in which case we can simply ignore it. Note that this handles
+                    // the case of repeated RT start marker after we've crossed an index boundary, which could well
+                    // appear in the middle of a row (CASSANDRA-14008).
+                    if (!tombstoneTracker.hasClosingMarkerBefore(atom) && tombstoneTracker.isShadowed(atom))
+                    {
+                        atoms.next(); // consume the atom since we only peeked it so far
+                        continue;
+                    }
+
+                    // Second, we should only pass "row" atoms to the cell grouper
+                    if (atom.isRowAtom(metadata))
+                    {
+                        if (!grouper.addAtom(atom))
+                            break; // done with the row; don't consume the atom
+                        atoms.next(); // the grouper "accepted" the atom, consume it since we only peeked above
+                    }
+                    else
+                    {
+                        LegacyLayout.LegacyRangeTombstone rt = (LegacyLayout.LegacyRangeTombstone) atom;
+                        // This means we have a non-row range tombstone. Unfortunately, that does not guarantee the
+                        // current row is finished (though it may), because due to the logic within LegacyRangeTombstone
+                        // constructor, we can get an out-of-order RT that includes on the current row (even if it is
+                        // already started) and extends past it.
+
+                        // So first, evacuate the easy case of the range tombstone simply starting after the current
+                        // row, in which case we're done with the current row (but don't consume the new RT yet so it
+                        // gets handled as any other non-row RT).
+                        if (grouper.startsAfterCurrentRow(rt))
+                            break;
+
+                        // Otherwise, we "split" the RT in 2: the part covering the current row, which is now an
+                        // inRowAtom and can be passed to the grouper, and the part after that, which we push back into
+                        // the iterator for later processing.
+                        Clustering currentRow = grouper.currentRowClustering();
+                        atoms.next(); // consume since we had only just peeked it so far and we're using it
+                        atoms.pushOutOfOrder(rt.withNewStart(Slice.Bound.exclusiveStartOf(currentRow)));
+                        // Note: in theory the withNewStart is a no-op here, but not taking any risk
+                        grouper.addAtom(rt.withNewStart(Slice.Bound.inclusiveStartOf(currentRow))
+                                          .withNewEnd(Slice.Bound.inclusiveEndOf(currentRow)));
+                    }
                 }
+
                 return grouper.getRow();
             }
 
@@ -583,51 +618,87 @@ public abstract class UnfilteredDeserializer
             private static class AtomIterator implements PeekingIterator<LegacyLayout.LegacyAtom>
             {
                 private final Supplier<LegacyLayout.LegacyAtom> atomReader;
-                private boolean isDone;
+                private boolean readerExhausted;
                 private LegacyLayout.LegacyAtom next;
 
-                private AtomIterator(Supplier<LegacyLayout.LegacyAtom> atomReader)
+                private final Comparator<LegacyLayout.LegacyAtom> atomComparator;
+                // May temporarily store atoms that needs to be handler later than when they were deserialized.
+                // Lazily initialized since it is used infrequently.
+                private Queue<LegacyLayout.LegacyAtom> outOfOrderAtoms;
+
+                private AtomIterator(Supplier<LegacyLayout.LegacyAtom> atomReader, CFMetaData metadata)
                 {
                     this.atomReader = atomReader;
+                    this.atomComparator = LegacyLayout.legacyAtomComparator(metadata);
                 }
 
                 public boolean hasNext()
                 {
-                    if (isDone)
-                        return false;
+                    if (readerExhausted)
+                        return hasOutOfOrderAtoms(); // We have to return out of order atoms when reader exhausts
 
+                    // Note that next() and peek() assumes that next has been set by this method, so we do it even if
+                    // we have some outOfOrderAtoms stacked up.
                     if (next == null)
-                    {
                         next = atomReader.get();
-                        if (next == null)
-                        {
-                            isDone = true;
-                            return false;
-                        }
-                    }
-                    return true;
+
+                    readerExhausted = next == null;
+                    return !readerExhausted || hasOutOfOrderAtoms();
                 }
 
                 public LegacyLayout.LegacyAtom next()
                 {
                     if (!hasNext())
                         throw new UnsupportedOperationException();
+
+                    if (hasOutOrderAtomBeforeNext())
+                        return outOfOrderAtoms.poll();
+
                     LegacyLayout.LegacyAtom toReturn = next;
                     next = null;
                     return toReturn;
                 }
 
+                private boolean hasOutOfOrderAtoms()
+                {
+                    return outOfOrderAtoms != null && !outOfOrderAtoms.isEmpty();
+                }
+
+                private boolean hasOutOrderAtomBeforeNext()
+                {
+                    // Note that if outOfOrderAtoms is null, the first condition will be false, so we can save a null
+                    // check on calling `outOfOrderAtoms.peek()` in the right branch.
+                    return hasOutOfOrderAtoms()
+                           && (next == null || atomComparator.compare(outOfOrderAtoms.peek(), next) <= 0);
+                }
+
                 public LegacyLayout.LegacyAtom peek()
                 {
                     if (!hasNext())
                         throw new UnsupportedOperationException();
+                    if (hasOutOrderAtomBeforeNext())
+                        return outOfOrderAtoms.peek();
                     return next;
                 }
 
+                /**
+                 * Push back an atom in the iterator assuming said atom sorts strictly _after_ the atom returned by
+                 * the last next() call (meaning the pushed atom fall in the part of the iterator that has not been
+                 * returned yet, not before). The atom will then be returned by the iterator in proper order.
+                 */
+                public void pushOutOfOrder(LegacyLayout.LegacyAtom atom)
+                {
+                    if (outOfOrderAtoms == null)
+                        outOfOrderAtoms = new PriorityQueue<>(atomComparator);
+                    outOfOrderAtoms.offer(atom);
+                }
+
                 public void clearState()
                 {
                     this.next = null;
-                    this.isDone = false;
+                    this.readerExhausted = false;
+                    if (outOfOrderAtoms != null)
+                        outOfOrderAtoms.clear();
                 }
 
                 public void remove()
@@ -685,7 +756,7 @@ public abstract class UnfilteredDeserializer
                     if (partitionDeletion.deletes(timestamp))
                         return true;
 
-                    SortedSet<LegacyLayout.LegacyRangeTombstone> coveringTombstones = isRow(atom) ? openTombstones : openTombstones.tailSet(atom.asRangeTombstone());
+                    SortedSet<LegacyLayout.LegacyRangeTombstone> coveringTombstones = atom.isRowAtom(metadata) ? openTombstones : openTombstones.tailSet(atom.asRangeTombstone());
                     return Iterables.any(coveringTombstones, tombstone -> tombstone.deletionTime.deletes(timestamp));
                 }
 
diff --git a/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeRangeTombstoneTest.java b/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeRangeTombstoneTest.java
new file mode 100644
index 0000000..e4b3a17
--- /dev/null
+++ b/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeRangeTombstoneTest.java
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.distributed.upgrade;
+
+import org.junit.Test;
+
+import org.apache.cassandra.distributed.api.ConsistencyLevel;
+import org.apache.cassandra.distributed.shared.DistributedTestBase;
+import org.apache.cassandra.distributed.shared.Versions;
+
+import static java.lang.String.format;
+import static org.apache.cassandra.distributed.shared.AssertUtils.assertRows;
+import static org.apache.cassandra.distributed.shared.AssertUtils.row;
+
+/**
+ * Tests related to the handle of range tombstones during 2.x to 3.x upgrades.
+ */
+public class MixedModeRangeTombstoneTest extends UpgradeTestBase
+{
+    /**
+     * Tests the interaction of range tombstones covering multiple rows and collection tombsones within the covered
+     * rows.
+     *
+     * <p>This test reproduces the issue of CASSANDRA-15805.
+     */
+    @Test
+    public void multiRowsRangeTombstoneAndCollectionTombstoneInteractionTest() throws Throwable {
+        String tableName = DistributedTestBase.KEYSPACE + ".t";
+        String schema = "CREATE TABLE " + tableName + " (" +
+                        "  k int," +
+                        "  c1 text," +
+                        "  c2 text," +
+                        "  a text," +
+                        "  b set<text>," +
+                        "  c text," +
+                        "  PRIMARY KEY((k), c1, c2)" +
+                        " )";
+
+
+        new TestCase()
+        .nodes(2)
+        .upgrade(Versions.Major.v22, Versions.Major.v30)
+        .setup(cluster -> {
+            cluster.schemaChange(schema);
+            cluster.coordinator(1).execute(format("DELETE FROM %s USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A'", tableName), ConsistencyLevel.ALL);
+            cluster.coordinator(1).execute(format("INSERT INTO %s(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 'bar') USING TIMESTAMP 2", tableName), ConsistencyLevel.ALL);
+            cluster.coordinator(1).execute(format("DELETE b FROM %s USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X'", tableName), ConsistencyLevel.ALL);
+            cluster.get(1).flush(DistributedTestBase.KEYSPACE);
+            cluster.get(2).flush(DistributedTestBase.KEYSPACE);
+        })
+        .runAfterNodeUpgrade((cluster, node) -> {
+            assertRows(cluster.coordinator(node).execute(format("SELECT * FROM %s", tableName), ConsistencyLevel.ALL),
+                       row(0, "A", "X", "foo", null, "bar"));
+        })
+        .run();
+    }
+}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org

[cassandra] 02/02: Fix LegacyLayout handling of non-selected collection tombstones

Posted by sl...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

slebresne pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit c8a2834606d683ba9945e9cc11bdb4207ce269d1
Author: Sylvain Lebresne <le...@gmail.com>
AuthorDate: Wed May 13 11:44:08 2020 +0200

    Fix LegacyLayout handling of non-selected collection tombstones
    
    If a collection tombstone is not included by a query, it can be ignored,
    but it currently made `LegacyLayout.CellGrouper#addCollectionTombstone`
    return `false`, which made it stop the current row, which is incorrect
    (this can potentially lead to a duplicate row). This patch changes it
    to return `true`.
    
    Patch by Sylvain Lebresne; reviewed by Marcus Eriksson & Aleksey
    Yeschenko for CASSANDRA-15805
---
 src/java/org/apache/cassandra/db/LegacyLayout.java |   6 +-
 .../org/apache/cassandra/db/LegacyLayoutTest.java  | 102 +++++++++++++++++----
 2 files changed, 88 insertions(+), 20 deletions(-)

diff --git a/src/java/org/apache/cassandra/db/LegacyLayout.java b/src/java/org/apache/cassandra/db/LegacyLayout.java
index 39dd54a..8492de5 100644
--- a/src/java/org/apache/cassandra/db/LegacyLayout.java
+++ b/src/java/org/apache/cassandra/db/LegacyLayout.java
@@ -1537,8 +1537,12 @@ public abstract class LegacyLayout
 
         private boolean addCollectionTombstone(LegacyRangeTombstone tombstone)
         {
+            // If the collection tombstone is not included in the query (which technically would only apply to thrift
+            // queries since CQL one "fetch" everything), we can skip it (so return), but we're problably still within
+            // the current row so we return `true`. Technically, it is possible that tombstone belongs to another row
+            // that the row currently grouped, but as we ignore it, returning `true` is ok in that case too.
             if (!helper.includes(tombstone.start.collectionName))
-                return false; // see CASSANDRA-13109
+                return true; // see CASSANDRA-13109
 
             // The helper needs to be informed about the current complex column identifier before
             // it can perform the comparison between the recorded drop time and the RT deletion time.
diff --git a/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java b/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java
index 0bb2459..f0d2a02 100644
--- a/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java
+++ b/test/unit/org/apache/cassandra/db/LegacyLayoutTest.java
@@ -24,18 +24,19 @@ import java.nio.file.Files;
 import java.nio.file.Path;
 import java.nio.file.Paths;
 
+import org.apache.cassandra.db.LegacyLayout.CellGrouper;
+import org.apache.cassandra.db.LegacyLayout.LegacyBound;
+import org.apache.cassandra.db.LegacyLayout.LegacyCell;
+import org.apache.cassandra.db.LegacyLayout.LegacyRangeTombstone;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.marshal.MapType;
 import org.apache.cassandra.db.marshal.UTF8Type;
-import org.apache.cassandra.db.partitions.ImmutableBTreePartition;
 import org.apache.cassandra.db.rows.BufferCell;
 import org.apache.cassandra.db.rows.Cell;
 import org.apache.cassandra.db.rows.RowIterator;
-import org.apache.cassandra.db.rows.Rows;
 import org.apache.cassandra.db.rows.SerializationHelper;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
 import org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer;
-import org.apache.cassandra.db.rows.UnfilteredRowIterators;
 import org.apache.cassandra.db.transform.FilteredRows;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.util.DataInputBuffer;
@@ -62,10 +63,10 @@ import org.apache.cassandra.db.rows.BTreeRow;
 import org.apache.cassandra.db.rows.Row;
 import org.apache.cassandra.dht.Murmur3Partitioner;
 import org.apache.cassandra.schema.KeyspaceParams;
-import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.Hex;
 
 import static org.apache.cassandra.net.MessagingService.VERSION_21;
+import static org.apache.cassandra.utils.ByteBufferUtil.bytes;
 import static org.junit.Assert.*;
 
 public class LegacyLayoutTest
@@ -98,7 +99,7 @@ public class LegacyLayoutTest
         builder.addComplexDeletion(b, new DeletionTime(1L, 1));
         Row row = builder.build();
 
-        ByteBuffer key = ByteBufferUtil.bytes(1);
+        ByteBuffer key = bytes(1);
         PartitionUpdate upd = PartitionUpdate.singleRowUpdate(table, key, row);
 
         LegacyLayout.LegacyUnfilteredPartition p = LegacyLayout.fromUnfilteredRowIterator(null, upd.unfilteredIterator());
@@ -216,7 +217,7 @@ public class LegacyLayoutTest
         builder.addCell(new BufferCell(v, 1L, Cell.NO_TTL, Cell.NO_DELETION_TIME, Int32Serializer.instance.serialize(1), null));
         Row row = builder.build();
 
-        DecoratedKey pk = table.decorateKey(ByteBufferUtil.bytes(1));
+        DecoratedKey pk = table.decorateKey(bytes(1));
         PartitionUpdate upd = PartitionUpdate.singleRowUpdate(table, pk, row, staticRow);
 
         try (RowIterator before = FilteredRows.filter(upd.unfilteredIterator(), FBUtilities.nowInSeconds());
@@ -243,7 +244,7 @@ public class LegacyLayoutTest
         builder.addComplexDeletion(bug, new DeletionTime(1L, 1));
         Row row = builder.build();
 
-        DecoratedKey pk = table.decorateKey(ByteBufferUtil.bytes(1));
+        DecoratedKey pk = table.decorateKey(bytes(1));
         PartitionUpdate upd = PartitionUpdate.singleRowUpdate(table, pk, row);
 
         UnfilteredRowIterator afterRoundTripVia32 = roundTripVia21(upd.unfilteredIterator());
@@ -277,7 +278,7 @@ public class LegacyLayoutTest
         builder.addComplexDeletion(bug, new DeletionTime(1L, 1));
         Row row = builder.build();
 
-        DecoratedKey pk = table.decorateKey(ByteBufferUtil.bytes(1));
+        DecoratedKey pk = table.decorateKey(bytes(1));
         PartitionUpdate upd = PartitionUpdate.singleRowUpdate(table, pk, row);
 
         // we need to perform the round trip in two parts here, with a column drop inbetween
@@ -385,26 +386,89 @@ public class LegacyLayoutTest
         SerializationHelper helper = new SerializationHelper(cfm, MessagingService.VERSION_22, SerializationHelper.Flag.LOCAL, ColumnFilter.all(cfm));
         LegacyLayout.CellGrouper cg = new LegacyLayout.CellGrouper(cfm, helper);
 
-        Slice.Bound startBound = Slice.Bound.create(ClusteringPrefix.Kind.INCL_START_BOUND, new ByteBuffer[] {ByteBufferUtil.bytes(2)});
-        Slice.Bound endBound = Slice.Bound.create(ClusteringPrefix.Kind.EXCL_END_BOUND, new ByteBuffer[] {ByteBufferUtil.bytes(2)});
-        LegacyLayout.LegacyBound start = new LegacyLayout.LegacyBound(startBound, false, cfm.getColumnDefinition(ByteBufferUtil.bytes("v")));
-        LegacyLayout.LegacyBound end = new LegacyLayout.LegacyBound(endBound, false, cfm.getColumnDefinition(ByteBufferUtil.bytes("v")));
+        Slice.Bound startBound = Slice.Bound.create(ClusteringPrefix.Kind.INCL_START_BOUND, new ByteBuffer[] {bytes(2)});
+        Slice.Bound endBound = Slice.Bound.create(ClusteringPrefix.Kind.EXCL_END_BOUND, new ByteBuffer[] {bytes(2)});
+        LegacyLayout.LegacyBound start = new LegacyLayout.LegacyBound(startBound, false, cfm.getColumnDefinition(bytes("v")));
+        LegacyLayout.LegacyBound end = new LegacyLayout.LegacyBound(endBound, false, cfm.getColumnDefinition(bytes("v")));
         LegacyLayout.LegacyRangeTombstone lrt = new LegacyLayout.LegacyRangeTombstone(start, end, new DeletionTime(2, 1588598040));
         assertTrue(cg.addAtom(lrt));
 
         // add a real cell
         LegacyLayout.LegacyCell cell = new LegacyLayout.LegacyCell(LegacyLayout.LegacyCell.Kind.REGULAR,
-                                                                   new LegacyLayout.LegacyCellName(new Clustering(ByteBufferUtil.bytes(2)),
-                                                                                                   cfm.getColumnDefinition(ByteBufferUtil.bytes("v")),
-                                                                                                   ByteBufferUtil.bytes("g")),
-                                                                   ByteBufferUtil.bytes("v"), 3, Integer.MAX_VALUE, 0);
+                                                                   new LegacyLayout.LegacyCellName(new Clustering(bytes(2)),
+                                                                                                   cfm.getColumnDefinition(bytes("v")),
+                                                                                                   bytes("g")),
+                                                                   bytes("v"), 3, Integer.MAX_VALUE, 0);
         assertTrue(cg.addAtom(cell));
 
         // add legacy range tombstone where collection name is null for the end bound (this gets translated to a row tombstone)
-        startBound = Slice.Bound.create(ClusteringPrefix.Kind.EXCL_START_BOUND, new ByteBuffer[] {ByteBufferUtil.bytes(2)});
-        endBound = Slice.Bound.create(ClusteringPrefix.Kind.EXCL_END_BOUND, new ByteBuffer[] {ByteBufferUtil.bytes(2)});
-        start = new LegacyLayout.LegacyBound(startBound, false, cfm.getColumnDefinition(ByteBufferUtil.bytes("v")));
+        startBound = Slice.Bound.create(ClusteringPrefix.Kind.EXCL_START_BOUND, new ByteBuffer[] {bytes(2)});
+        endBound = Slice.Bound.create(ClusteringPrefix.Kind.EXCL_END_BOUND, new ByteBuffer[] {bytes(2)});
+        start = new LegacyLayout.LegacyBound(startBound, false, cfm.getColumnDefinition(bytes("v")));
         end = new LegacyLayout.LegacyBound(endBound, false, null);
         assertTrue(cg.addAtom(new LegacyLayout.LegacyRangeTombstone(start, end, new DeletionTime(1, 1588598040))));
     }
+
+    private static LegacyCell cell(Clustering clustering, ColumnDefinition column, ByteBuffer value, long timestamp)
+    {
+        return new LegacyCell(LegacyCell.Kind.REGULAR,
+                              new LegacyLayout.LegacyCellName(clustering, column, null),
+                              value,
+                              timestamp,
+                              Cell.NO_DELETION_TIME,
+                              Cell.NO_TTL);
+    }
+
+    /**
+     * This tests that when {@link CellGrouper} gets a collection tombstone for
+     * a non-fetched collection, then that tombstone does not incorrectly stop the grouping of the current row, as
+     * was done before CASSANDRA-15805.
+     *
+     * <p>Please note that this rely on a query only _fetching_ some of the table columns, which in practice only
+     * happens for thrift queries, and thrift queries shouldn't mess up with CQL tables and collection tombstones,
+     * so this test is not of the utmost importance. Nonetheless, the pre-CASSANDRA-15805 behavior was incorrect and
+     * this ensure it is fixed.
+     */
+    @Test
+    public void testCellGrouperOnNonFecthedCollectionTombstone()
+    {
+        // CREATE TABLE %s (pk int, ck int, a text, b set<text>, c text, PRIMARY KEY (pk, ck))
+        CFMetaData cfm = CFMetaData.Builder.create("ks", "table")
+                                           .addPartitionKey("pk", Int32Type.instance)
+                                           .addClusteringColumn("ck", Int32Type.instance)
+                                           .addRegularColumn("a", UTF8Type.instance)
+                                           .addRegularColumn("b", SetType.getInstance(UTF8Type.instance, true))
+                                           .addRegularColumn("c", UTF8Type.instance)
+                                           .build();
+
+        // Creates a filter that _only_ fetches a and c, but not b.
+        ColumnFilter filter = ColumnFilter.selectionBuilder()
+                                          .add(cfm.getColumnDefinition(bytes("a")))
+                                          .add(cfm.getColumnDefinition(bytes("c")))
+                                          .build();
+        SerializationHelper helper = new SerializationHelper(cfm,
+                                                             MessagingService.VERSION_22,
+                                                             SerializationHelper.Flag.LOCAL,
+                                                             filter);
+        CellGrouper grouper = new CellGrouper(cfm, helper);
+        Clustering clustering = new Clustering(bytes(1));
+
+        // We add a cell for a, then a collection tombstone for b, and then a cell for c (for the same clustering).
+        // All those additions should return 'true' as all belong to the same row.
+        LegacyCell ca = cell(clustering, cfm.getColumnDefinition(bytes("a")), bytes("v1"), 1);
+        assertTrue(grouper.addAtom(ca));
+
+        Slice.Bound startBound = Slice.Bound.inclusiveStartOf(bytes(1));
+        Slice.Bound endBound = Slice.Bound.inclusiveEndOf(bytes(1));
+        ColumnDefinition bDef = cfm.getColumnDefinition(bytes("b"));
+        assert bDef != null;
+        LegacyBound start = new LegacyBound(startBound, false, bDef);
+        LegacyBound end = new LegacyBound(endBound, false, bDef);
+        LegacyRangeTombstone rtb = new LegacyRangeTombstone(start, end, new DeletionTime(1, 1588598040));
+        assertTrue(rtb.isCollectionTombstone()); // Ensure we're testing what we think
+        assertTrue(grouper.addAtom(rtb));
+
+        LegacyCell cc = cell(clustering, cfm.getColumnDefinition(bytes("c")), bytes("v2"), 1);
+        assertTrue(grouper.addAtom(cc));
+    }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org