You are viewing a plain text version of this content. The canonical link for it is here.
Posted to pr@cassandra.apache.org by GitBox <gi...@apache.org> on 2023/01/06 16:43:55 UTC

[GitHub] [cassandra] blambov commented on a diff in pull request #2065: Cassandra 18134

blambov commented on code in PR #2065:
URL: https://github.com/apache/cassandra/pull/2065#discussion_r1063458672


##########
src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java:
##########
@@ -163,12 +167,14 @@ static class BigVersion extends Version
             hasCommitLogLowerBound = version.compareTo("mb") >= 0;
             hasCommitLogIntervals = version.compareTo("mc") >= 0;
             hasAccurateMinMax = version.compareTo("md") >= 0;
-            hasOriginatingHostId = version.matches("(m[e-z])|(n[b-z])");
+            hasOriginatingHostId = version.matches("(m[e-z])|(n[b-z])|(o[a-z])");

Review Comment:
   This would be easy to forget to update. How about `version.compareTo("nb") >= 0 || version.matches("m[e-z]")`?



##########
src/java/org/apache/cassandra/db/Slice.java:
##########
@@ -229,20 +237,24 @@ public Slice forPaging(ClusteringComparator comparator, Clustering<?> lastReturn
     }
 
     /**
-     * Given the per-clustering column minimum and maximum value a sstable contains, whether or not this slice potentially
-     * intersects that sstable or not.
+     * Whether this slice and the provided slice intersects.
      *
      * @param comparator the comparator for the table this is a slice of.
-     * @param minClusteringValues the smallest values for each clustering column that a sstable contains.
-     * @param maxClusteringValues the biggest values for each clustering column that a sstable contains.
-     *
-     * @return whether the slice might intersects with the sstable having {@code minClusteringValues} and
-     * {@code maxClusteringValues}.
+     * @param other      the other slice to check intersection with.
+     * @return whether this slice intersects {@code other}.
      */
-    public boolean intersects(ClusteringComparator comparator, List<ByteBuffer> minClusteringValues, List<ByteBuffer> maxClusteringValues)
+    public boolean intersects(ClusteringComparator comparator, Slice other)
     {
-        // If this slice starts after max clustering or ends before min clustering, it can't intersect
-        return start.compareTo(comparator, maxClusteringValues) <= 0 && end.compareTo(comparator, minClusteringValues) >= 0;
+        // Empty slices never intersect anything (and we have to special case it as there is many ways to build an
+        // empty slice; for instance, without this, (0, 0) would intersect Slice.ALL or [-1, 1]).
+        if (isEmpty(comparator) || other.isEmpty(comparator))

Review Comment:
   A simpler implementation of intersection when empty slices can occur is to take the intersection and check if it is empty:
   ```
           return comparator.compare(Comparables.max(start, other.start, comparator), 
                                     Comparables.min(end, other.end, comparator)) < 0;
   ```
   This saves one comparison and is no less readable to me.



##########
src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java:
##########
@@ -259,11 +278,52 @@ public MetadataCollector sstableLevel(int sstableLevel)
         return this;
     }
 
-    public MetadataCollector updateClusteringValues(ClusteringPrefix<?> clustering)
+    public void updateClusteringValues(Clustering<?> clustering)
     {
-        minClustering = minClustering == null || comparator.compare(clustering, minClustering) < 0 ? clustering : minClustering;
-        maxClustering = maxClustering == null || comparator.compare(clustering, maxClustering) > 0 ? clustering : maxClustering;
-        return this;
+        if (clustering == Clustering.STATIC_CLUSTERING)
+            return;
+
+        if (!clusteringInitialized)

Review Comment:
   Does this special case exist only to make sure we don't call `retainable` twice? Per the comment below, this is not necessary, and `MAX_START` and `MIN_END` will be greater/smaller than any valid clustering prefix -- this code, its version below and the variable can be removed altogether.



##########
src/java/org/apache/cassandra/db/Slices.java:
##########
@@ -243,17 +249,12 @@ private List<Slice> normalize(List<Slice> slices)
             if (slices.size() <= 1)
                 return slices;
 
-            Collections.sort(slices, new Comparator<Slice>()
-            {
-                @Override
-                public int compare(Slice s1, Slice s2)
-                {
-                    int c = comparator.compare(s1.start(), s2.start());
-                    if (c != 0)
-                        return c;
+            slices.sort((s1, s2) -> {
+                int c = comparator.compare(s1.start(), s2.start());
+                if (c != 0)
+                    return c;
 
-                    return comparator.compare(s1.end(), s2.end());
-                }
+                return comparator.compare(s1.end(), s2.end());

Review Comment:
   Reversing the direction here would be a tad more efficient (it would avoid going through `Slice.make` on 283 below).



##########
src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorWithLowerBound.java:
##########
@@ -186,7 +193,7 @@ private static <V> ClusteringBound<V> createArtificialLowerBound(boolean isRever
     /**
      * @return the lower bound stored on the index entry for this partition, if available.
      */
-    private ClusteringBound<?> getPartitionIndexLowerBound()
+    private ClusteringBound<?> getKeyCacheLowerBound()
     {
         // NOTE: CASSANDRA-11206 removed the lookup against the key-cache as the IndexInfo objects are no longer

Review Comment:
   Isn't this note invalid?
   
   Also, I don't think we want to load the partition if `canUseMetadataLowerBound` is false -- we can just forego the bound in that case.



##########
src/java/org/apache/cassandra/db/ClusteringBound.java:
##########
@@ -78,7 +85,7 @@ default int compareTo(ClusteringComparator comparator, List<ByteBuffer> sstableB
             // Say the slice bound is a start. It means we're in the case where the max
             // sstable bound is say (1:5) while the slice start is (1). So the start
             // does start before the sstable end bound (and intersect it). It's the exact
-            // inverse with a end slice bound.
+            // inverse with an end slice bound.

Review Comment:
   This method is no longer needed.



##########
src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java:
##########
@@ -259,11 +278,52 @@ public MetadataCollector sstableLevel(int sstableLevel)
         return this;
     }
 
-    public MetadataCollector updateClusteringValues(ClusteringPrefix<?> clustering)
+    public void updateClusteringValues(Clustering<?> clustering)
     {
-        minClustering = minClustering == null || comparator.compare(clustering, minClustering) < 0 ? clustering : minClustering;
-        maxClustering = maxClustering == null || comparator.compare(clustering, maxClustering) > 0 ? clustering : maxClustering;
-        return this;
+        if (clustering == Clustering.STATIC_CLUSTERING)
+            return;
+
+        if (!clusteringInitialized)
+        {
+            clusteringInitialized = true;
+            minClustering = clustering.retainable();
+            maxClustering = minClustering;
+        }
+        else if (comparator.compare((ClusteringPrefix<?>) clustering, (ClusteringPrefix<?>) maxClustering) > 0)
+        {
+            maxClustering = clustering.retainable();

Review Comment:
   Why not call `retainable` in `finalizeMetadata` instead of doing it repeatedly here?



##########
src/java/org/apache/cassandra/db/Slice.java:
##########
@@ -164,8 +174,7 @@ else if (cmp > 0)
      * Returns whether a given clustering or bound is included in this slice.

Review Comment:
   Please replace the above with just
   ```
   assert start.isStart() && end.isEnd();
   return comparator.compare(start, end) >= 0;
   // Note: the comparator orders inclusive starts and exclusive ends as equal, and inclusive ends as being greater than starts.
   ```



##########
src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java:
##########
@@ -712,42 +712,57 @@ private UnfilteredRowIterator queryMemtableAndDiskInternal(ColumnFamilyStore cfs
                     break;
                 }
 
-                if (shouldInclude(sstable))
+                boolean intersects = intersects(sstable);
+                boolean hasRequiredStatics = hasRequiredStatics(sstable);
+                boolean hasPartitionLevelDeletions = hasPartitionLevelDeletions(sstable);
+
+                if (!intersects && !hasRequiredStatics && !hasPartitionLevelDeletions)
+                {
+                    nonIntersectingSSTables++;
+                    continue;
+                }
+
+                if (intersects || hasRequiredStatics)
                 {
                     if (!sstable.isRepaired())
                         controller.updateMinOldestUnrepairedTombstone(sstable.getMinLocalDeletionTime());
 
                     // 'iter' is added to iterators which is closed on exception, or through the closing of the final merged iterator
                     @SuppressWarnings("resource")
-                    UnfilteredRowIteratorWithLowerBound iter = makeIterator(cfs, sstable, metricsCollector);
+                    UnfilteredRowIterator iter = intersects ? makeRowIteratorWithLowerBound(cfs, sstable, metricsCollector)
+                                                            : makeRowIteratorWithSkippedNonStaticContent(cfs, sstable, metricsCollector);
+
                     inputCollector.addSSTableIterator(sstable, iter);
                     mostRecentPartitionTombstone = Math.max(mostRecentPartitionTombstone,
                                                             iter.partitionLevelDeletion().markedForDeleteAt());
                 }
                 else
                 {
                     nonIntersectingSSTables++;
-                    // sstable contains no tombstone if maxLocalDeletionTime == Integer.MAX_VALUE, so we can safely skip those entirely
-                    if (sstable.mayHaveTombstones())
+
+                    // if the sstable contained range or cell tombstones, it would intersect; since we are here, it means
+                    // that there are no cell or range tombstones we are interested in (due to the filter)
+                    // however, we know that there are partition level deletions in this sstable and we need to make
+                    // an iterator figure out that (see `StatsMetadata.hasPartitionLevelDeletions`)
+
+                    // 'iter' is added to iterators which is closed on exception, or through the closing of the final merged iterator
+                    @SuppressWarnings("resource")
+                    UnfilteredRowIterator iter = makeRowIteratorWithLowerBound(cfs, sstable, metricsCollector);

Review Comment:
   Would `WithSkippedNonStaticContent` not be preferable?



##########
src/java/org/apache/cassandra/io/sstable/metadata/StatsMetadata.java:
##########
@@ -453,29 +478,30 @@ public StatsMetadata deserialize(Version version, DataInputPlus in) throws IOExc
             int sstableLevel = in.readInt();
             long repairedAt = in.readLong();
 
+            List<AbstractType<?>> clusteringTypes = null;
+            Slice coveredClustering = Slice.ALL;
             if (version.hasImprovedMinMax())
             {
-                throw new UnsupportedEncodingException();
+                clusteringTypes = typeSerializer.deserializeList(in);
+                coveredClustering = Slice.serializer.deserialize(in, version.correspondingMessagingVersion(), clusteringTypes);
             }
-
-            // for legacy sstables, we skip deserializing the min and max clustering value
-            // to prevent erroneously excluding sstables from reads (see CASSANDRA-14861)
-            int colCount = in.readInt();
-            List<ByteBuffer> minClusteringValues = new ArrayList<>(colCount);
-            for (int i = 0; i < colCount; i++)
+            else
             {
-                ByteBuffer val = ByteBufferUtil.readWithShortLength(in);
-                if (version.hasAccurateMinMax())
-                    minClusteringValues.add(val);
-            }
+                // for legacy sstables, we skip deserializing the min and max clustering value

Review Comment:
   I prefer the DSE version of the comment:
   ```
                   // We always deserialize the min/max clustering values if they are there, but we ignore them for
                   // legacy sstables where !hasAccurateMinMax due to CASSANDRA-14861.
   ```



##########
src/java/org/apache/cassandra/db/Slice.java:
##########
@@ -106,6 +109,13 @@ public static Slice make(Clustering<?> start, Clustering<?> end)
         return new Slice(ClusteringBound.inclusiveStartOf(start), ClusteringBound.inclusiveEndOf(end));
     }
 
+    public static Slice make(ClusteringPrefix<?> start, ClusteringPrefix<?> end)
+    {
+        // This doesn't give us what we want with the clustering prefix
+        assert start != Clustering.STATIC_CLUSTERING && end != Clustering.STATIC_CLUSTERING;
+        return make(start.asStartBound(), end.asEndBound());

Review Comment:
   The version above is no longer needed.
   
   However, this (and the version above) does not match
   ```
    * A slice selects every row whose clustering is bigger than the slice start prefix but smaller
    * than the end prefix.
    ```
   and makes me worry that `asStartBound` and `asEndBound` without an inclusiveness parameter are dangerous. The javadoc of all three methods should state very loudly that this is an inclusive conversion for clusterings.



##########
src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorWithLowerBound.java:
##########
@@ -245,21 +252,32 @@ private ClusteringBound<?> getPartitionIndexLowerBound()
      */
     private boolean canUseMetadataLowerBound()
     {
-        // Side-note: pre-2.1 sstable stat file had clustering value arrays whose size may not match the comparator size
-        // and that would break getMetadataLowerBound. We don't support upgrade from 2.0 to 3.0 directly however so it's
-        // not a true concern. Besides, !sstable.mayHaveTombstones already ensure this is a 3.0 sstable anyway.
-        return !sstable.mayHaveTombstones() && !sstable.metadata().isCompactTable();
+        if (sstable.metadata().isCompactTable())

Review Comment:
   The comment on top of this method does not appear to be valid.



##########
src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorWithLowerBound.java:
##########
@@ -245,21 +252,32 @@ private ClusteringBound<?> getPartitionIndexLowerBound()
      */
     private boolean canUseMetadataLowerBound()
     {
-        // Side-note: pre-2.1 sstable stat file had clustering value arrays whose size may not match the comparator size
-        // and that would break getMetadataLowerBound. We don't support upgrade from 2.0 to 3.0 directly however so it's
-        // not a true concern. Besides, !sstable.mayHaveTombstones already ensure this is a 3.0 sstable anyway.
-        return !sstable.mayHaveTombstones() && !sstable.metadata().isCompactTable();
+        if (sstable.metadata().isCompactTable())
+            return false;
+
+        Slices requestedSlices = slices;
+
+        if (requestedSlices.isEmpty())
+            return true;
+
+        if (!isReverseOrder())

Review Comment:
   We need a comment here that if the requested slice falls within the covered slice, there's no need to delay producing data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscribe@cassandra.apache.org
For additional commands, e-mail: pr-help@cassandra.apache.org