You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "cshannon (via GitHub)" <gi...@apache.org> on 2023/09/18 17:13:34 UTC

[GitHub] [accumulo] cshannon opened a new pull request, #3761: No chop merge and delete

cshannon opened a new pull request, #3761:
URL: https://github.com/apache/accumulo/pull/3761

   This PR is to merge in the no-chop merge and no-chop/no-split feature into main. There are still some follow on tasks to do which I will list and create issues for but the main functionality is complete and now that 3.0 has been release I think it's time to get what is here reviewed and merged into main so we can continue on.
   
   The goal here is to prevent having to run chop compactions when merging (or deleting) tablets which can be quite slow. The other goal is to get rid of having to split tablets during deletions to simplify the manager code. Instead, the changes here allow RFiles to be fenced off by by valid ranges so that we don't need to compact/split anymore and when scanning only the valid range is visible. Files can then be cleaned up later with the normal compaction process.
   
   **Notes:** 
   
   1. This PR squshes all the commits from the [no-chop-merge](https://github.com/apache/accumulo/tree/no-chop-merge) branch into one commit to make it much easier to review. I left that original branch alone for now and pushed this to a different branch.
   2. I plan to create a wiki page or website post at some point with full details of how no-chop merge works and the changes (as it would be quite large) but the highlights are below for reviewing the PR:
   
   ### Description of Changes:
   
   **TabletFile**
   
   Tablet files used to be unique by just the path. Now a TabletFile also includes a Range. This allows Accumulo to track the same file more than one time, each with a different associated range. So the same file can have several valid ranges and be treated as different files. The benefit here is most of the TabletFile handling code can stay the same. If a Range is not provided (null start/end) then the entire file is visible and this is how backwards compability is handled.
   
   **RFiles**
   
   There is now a new RFile iterator/reader that can fence files by taking a collection of ranges and only returning rows that are part of the given range when scanning. Rows outside of the range are skipped and not visible.
   
   **Metadata**
   
   The File metadata that is stored has been changed to now include a range. Originally each file just contained the path in the Data File column. Now this has been changed so that the value is Json and includes both a file and a range. This means that the same file can be stored more than one time in metadata as the combination of file and range makes it unique.
   
   **Merge Strategy**
   
   During a merge, instead of compacting tablets, each tablet that is being merged will instead have its files fenced by the tablet range. This is a quick operation as this just requires going through the files in metadata and adding the range to the entry and then adding that entry to the target tablet where files are being merged to . If an existing range already exists for the file when processing then that file has its range clipped with the tablet range to get the valid overlapping range. Each file will only have 1 range created.
   
   **Delete Strategy**
   
   Deletes are a little more complicated because besides chop compactions when there are shared boundaries (tablets are stretched to cover the deleted range so we need to compact), we also need to handle splits. Normally splits are first done at the deletion boundaries so entire tablets can be deleted but now we can also fence off tablets instead of splitting. Deletion now works by figuring out the first/last tablets in the deletion range and instead of splitting them or compacting the algorithm will fence the tablets (if needed) and create 0, 1 or 2 ranges. Then any tablets that exist in the middle are still just deleted like before.
   
   When the deletion range overlaps part of the first tablet or part of the last tablet we need to create 1 valid range for the files in those tablets (the part of the file not being deleted). This is done instead of a split. If a deletion range is entirely inside of a single tablet then we need to create two ranges (two files are added for each existing file) as we need to fence the file by the valid range before and after the delete range.
   
   **Future Issues to be Done**
   
   1. All of the existing chop compaction related code can be removed (or deprecated if public API). There's a lot to remove and would make the PR much larger so I figured it was best to do a follow on.
   3. We still need some more testing to make sure all edge cases are handled
   4. Upgrade code needs to be written once the changes are approved
   5. There might be some simpler things we could do to improve things as a follow on. One example @keith-turner mentioned was maybe instead of deleting tablets during deletion we could just remove the files and call merge later. We may also be able to simplify the metadata updates into 1 method/loop vs 2 for deletes. I plan to explore some of this going forward.
   6. All of the changes will need to be merged/refactored into Elasticity branch.
   
   **Included PRs:**
   
   https://github.com/apache/accumulo/pull/3418
   https://github.com/apache/accumulo/pull/3480
   https://github.com/apache/accumulo/pull/3614
   https://github.com/apache/accumulo/pull/3621
   https://github.com/apache/accumulo/pull/3640
   https://github.com/apache/accumulo/pull/3710
   https://github.com/apache/accumulo/pull/3728
   
   This closes #1327 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1334274040


##########
core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java:
##########
@@ -118,11 +148,115 @@ public String toString() {
    * Validates that the provided metadata string for the StoredTabletFile is valid.
    */
   public static void validate(String metadataEntry) {
-    ReferencedTabletFile.parsePath(new Path(URI.create(metadataEntry)));
+    final TabletFileCq tabletFileCq = deserialize(metadataEntry);
+    // Validate the path
+    ReferencedTabletFile.parsePath(deserialize(metadataEntry).path);
+    // Validate the range
+    requireRowRange(tabletFileCq.range);
+  }
+
+  public static StoredTabletFile of(final Text metadataEntry) {
+    return new StoredTabletFile(Objects.requireNonNull(metadataEntry).toString());
   }
 
   public static StoredTabletFile of(final String metadataEntry) {
     return new StoredTabletFile(metadataEntry);
   }
 
+  public static StoredTabletFile of(final URI path, Range range) {
+    return of(new Path(Objects.requireNonNull(path)), range);
+  }
+
+  public static StoredTabletFile of(final Path path, Range range) {
+    return new StoredTabletFile(new TabletFileCq(Objects.requireNonNull(path), range));
+  }
+
+  private static final Gson gson = ByteArrayToBase64TypeAdapter.createBase64Gson();
+
+  private static TabletFileCq deserialize(String json) {
+    final TabletFileCqMetadataGson metadata =
+        gson.fromJson(Objects.requireNonNull(json), TabletFileCqMetadataGson.class);
+    // Recreate the exact Range that was originally stored in Metadata. Stored ranges are originally
+    // constructed with inclusive/exclusive for the start and end key inclusivity settings.
+    // (Except for Ranges with no start/endkey as then the inclusivity flags do not matter)
+    //
+    // With this particular constructor, when setting the startRowInclusive to true and
+    // endRowInclusive to false, both the start and end row values will be taken as is
+    // and not modified and will recreate the original Range.
+    //
+    // This constructor will always set the resulting inclusivity of the Range to be true for the
+    // start row and false for end row regardless of what the startRowInclusive and endRowInclusive
+    // flags are set to.
+    return new TabletFileCq(new Path(URI.create(metadata.path)),
+        new Range(decodeRow(metadata.startRow), true, decodeRow(metadata.endRow), false));
+  }
+
+  public static String serialize(String path) {
+    return serialize(path, new Range());
+  }
+
+  public static String serialize(String path, Range range) {
+    requireRowRange(range);
+    final TabletFileCqMetadataGson metadata = new TabletFileCqMetadataGson();
+    metadata.path = Objects.requireNonNull(path);
+    metadata.startRow = encodeRow(range.getStartKey());
+    metadata.endRow = encodeRow(range.getEndKey());
+
+    return gson.toJson(metadata);
+  }
+
+  private static String serialize(TabletFileCq tabletFileCq) {
+    return serialize(Objects.requireNonNull(tabletFileCq).path.toString(), tabletFileCq.range);
+  }
+
+  /**
+   * Helper methods to encode and decode rows in a range to/from byte arrays. Null rows will just be
+   * returned as an empty byte array
+   **/
+
+  private static byte[] encodeRow(final Key key) {
+    final Text row = key != null ? key.getRow() : null;
+    if (row != null) {
+      try (DataOutputBuffer buffer = new DataOutputBuffer()) {
+        row.write(buffer);
+        return buffer.getData();
+      } catch (IOException e) {
+        throw new UncheckedIOException(e);
+      }

Review Comment:
   @keith-turner - Thanks for the suggestion, I made the changes with the serialization to shrink how much is written and also the html escaping. I had also notice the long out put as well a while back and meant to go back and figure out why but I forgot because the last few weeks I've been testing with deserializing back into the Range object so it was human readable (as I noticed elsehwere) .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1731246402

   > I have been running this on a single node instance and experimenting with it, its really neat and is working really well. I created [this gist](https://gist.github.com/keith-turner/08cb06d84d3040fb60b2bc058b9e9a5d) showing some of the things I tried. One problem I ran into is there is no easy way to see the ranges, I think we need a follow on issue for this. Need something in place for debugging purposes that makes it easy to see the ranges in the metadata table.
   > 
   
   I can create a follow on issue. For debug logging (if we don't already have the `StoredTabletFile` object loaded) we could deserialize the metadata back into either a `StoredTabletFile` or just to the json object that is used to serialize which is `TabletFileCqMetadataGson` and use that to print it as human readable form. Reading it back into `TabletFileCqMetadataGson` would be quicker and more efficient if we just wanted the Range than loading an entire `StoredTabletFile`.
   
   For seeing the ranges when writing ITs the simplest thing was to deserialize the metadata entry back into the entire `StoredTabletFile` object and then print the range object and other things. I did this for all of the ITs I modified for merge/delete so I could easily see what was stored. The helper method I used to print the metadata with human readable ranges is part of this PR in [FileMetadata](https://github.com/apache/accumulo/blob/ab3e55147fb11e4df55c7a47680fa3b5f85cee4a/test/src/main/java/org/apache/accumulo/test/util/FileMetadataUtil.java). I just added a log statement where I print the Range and other info: https://github.com/apache/accumulo/blob/ab3e55147fb11e4df55c7a47680fa3b5f85cee4a/test/src/main/java/org/apache/accumulo/test/util/FileMetadataUtil.java#L61-L63
   
   This outputs like the following: `Extent: 5<<; File Name: F000000c.rf; Range: (-inf,m%00; : [] 9223372036854775807 false); Entries: 2, Size: 212`
   
   > I manually exported and imported a table where the files had non infinite ranges and it worked, the ranges made it. Do you know if there are test for this? If not I think we should open a follow on issue to create test for this.
   > 
   I haven't specifically done any testing with import/export so I can create a follow on for testing. My assumption is that it should work (along with bulk import) due to everything is just treated as a StoredTabletFile in the rest of the code base and the fact that it has a range shouldn't really matter but we should definitely test it.
   > I manually cloned a table that had files with non infinite ranges and that worked great. If that does not have test, we probably want to open a follow on issue for that.
   > 
   Same thing here, I don't think there's a test (at least I didn't write one). I am thinking I can create an overall top level issue for adding testing for no-chop merge and we can just keep a running list of things to add and add PRs to cover each case as there's likely several things we want to test with ranges.
   > Noticed MergeState still has WAITING_FOR_CHOPPED state, is that something you were thinking of removing as a follow on? Have you created any issues for known follow on work yet?
   > 
   Right, I figured I would just clean up all "chopped" related things in a follow on. I started to work on removing all of that stuff, like `WAITING_FOR_CHOPPED` from MergeState, removing things from Thrift, etc and the diff was starting to hit another 1000 lines of code changes to remove all the chop compaction code. So I figured for now I'd just leave it all as it doesn't hurt anything (just not used) and I was going to create an issue and follow on PR to remove all of it at one time.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335022327


##########
core/src/main/java/org/apache/accumulo/core/gc/ReferenceFile.java:
##########
@@ -49,16 +67,16 @@ public TableId getTableId() {
   }
 
   @Override
-  public String getMetadataEntry() {
-    return metadataEntry;
+  public String getMetadataPath() {

Review Comment:
   I think the javadoc here needs to be updated and I can do that. I made changes and forgot to update it. The path here could be exactly what's in metadata (such as if it's a `StoredTabletFile`) or may be normalized if the reference is `ScanServerRefTabletFile`.
   
   I don't think it actually needs to be the exact path since we are not using it to remove entries in the metadata table and where the path is used later by `GarbageCollectionAlgorithm` it is actually processed and made relative anyways [here](https://github.com/cshannon/accumulo/blob/ecf7c7ac31379b5e5fc95a33df4237927edd8a9c/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java#L175)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335179590


##########
core/src/test/java/org/apache/accumulo/core/file/rfile/FencedRFileTest.java:
##########
@@ -0,0 +1,376 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.file.rfile;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+import org.apache.accumulo.core.crypto.CryptoTest;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.core.file.FileSKVIterator;
+import org.apache.accumulo.core.file.rfile.RFile.FencedReader;
+import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
+import org.apache.accumulo.core.iteratorsImpl.system.MultiIterator;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.Text;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+
+public class FencedRFileTest extends AbstractRFileTest {

Review Comment:
   This may be similar to https://github.com/apache/accumulo/issues/3448 that was opened, I went ahead and just linked that and added it to the list in #3766 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335023110


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fencedStartKey;
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fencedEndKey;
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+
+    private Key getEndKey(Key key) {
+      // If they key is infinite it will be null or if inclusive we can just use it as is
+      // as it would be the correct value for getLastKey()
+      if (fence.isInfiniteStopKey() || fence.isEndKeyInclusive()) {
+        return key;
+      }
+
+      // If exclusive we need to strip the last byte to get the last key that is part of the
+      // actual range to return
+      final byte[] ba = key.getRow().getBytes();
+      Preconditions.checkArgument(ba.length > 0 && ba[ba.length - 1] == (byte) 0x00);
+      byte[] fba = new byte[ba.length - 1];
+      System.arraycopy(ba, 0, fba, 0, ba.length - 1);
+
+      return new Key(fba);
+    }
+
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
+      throw new UnsupportedOperationException();
+    }
+  }
+
+  static class FencedReader extends FencedFileSKVIterator implements RFileSKVIterator {
+
+    private final Reader reader;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      super(reader, seekFence);
+      this.reader = reader;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.reset();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public FileSKVIterator getIndex() throws IOException {
+      return new FencedIndex(reader.getIndex(), fence);
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      final Reader sample = reader.getSample(sampleConfig);
+      return sample != null ? new FencedReader(sample, fence) : null;
+    }
+
+    @Override
+    public void reset() {
+      reader.reset();
+    }
+  }
+
+  public static RFileSKVIterator getReader(final CachableBuilder cb, final TabletFile dataFile)
+      throws IOException {
+    final RFile.Reader reader = new RFile.Reader(Objects.requireNonNull(cb));
+    return dataFile.hasRange() ? new FencedReader(reader, dataFile.getRange()) : reader;
+  }
+
+  public static RFileSKVIterator getReader(final CachableBuilder cb, Range range)
+      throws IOException {
+    final RFile.Reader reader = new RFile.Reader(Objects.requireNonNull(cb));
+    return !Objects.requireNonNull(range).isInfiniteStartKey() || !range.isInfiniteStopKey()

Review Comment:
   Fixed in https://github.com/apache/accumulo/pull/3761/commits/706c26702495373f0fa61a5ab6e1aba7b117bd0e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335022973


##########
server/manager/src/test/java/org/apache/accumulo/manager/TabletGroupWatcherTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.manager;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.core.util.Pair;
+import org.junit.jupiter.api.Test;
+
+public class TabletGroupWatcherTest {
+
+  @Test
+  public void testComputeNewDfvEven() {
+    DataFileValue original = new DataFileValue(20, 10, 100);
+    Pair<DataFileValue,DataFileValue> newValues = TabletGroupWatcher.computeNewDfv(original);
+
+    assertEquals(10, newValues.getFirst().getSize());
+    assertEquals(5, newValues.getFirst().getNumEntries());
+    assertEquals(original.getTime(), newValues.getFirst().getTime());
+    assertEquals(10, newValues.getSecond().getSize());
+    assertEquals(5, newValues.getSecond().getNumEntries());
+    assertEquals(original.getTime(), newValues.getSecond().getTime());
+  }
+
+  @Test
+  public void testComputeNewDfvOdd() {
+    DataFileValue original = new DataFileValue(21, 11, 100);
+    Pair<DataFileValue,DataFileValue> newValues = TabletGroupWatcher.computeNewDfv(original);
+
+    assertEquals(10, newValues.getFirst().getSize());
+    assertEquals(5, newValues.getFirst().getNumEntries());
+    assertEquals(original.getTime(), newValues.getFirst().getTime());
+    assertEquals(11, newValues.getSecond().getSize());
+    assertEquals(6, newValues.getSecond().getNumEntries());
+    assertEquals(original.getTime(), newValues.getSecond().getTime());
+  }
+

Review Comment:
   Added in https://github.com/apache/accumulo/pull/3761/commits/706c26702495373f0fa61a5ab6e1aba7b117bd0e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1334274040


##########
core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java:
##########
@@ -118,11 +148,115 @@ public String toString() {
    * Validates that the provided metadata string for the StoredTabletFile is valid.
    */
   public static void validate(String metadataEntry) {
-    ReferencedTabletFile.parsePath(new Path(URI.create(metadataEntry)));
+    final TabletFileCq tabletFileCq = deserialize(metadataEntry);
+    // Validate the path
+    ReferencedTabletFile.parsePath(deserialize(metadataEntry).path);
+    // Validate the range
+    requireRowRange(tabletFileCq.range);
+  }
+
+  public static StoredTabletFile of(final Text metadataEntry) {
+    return new StoredTabletFile(Objects.requireNonNull(metadataEntry).toString());
   }
 
   public static StoredTabletFile of(final String metadataEntry) {
     return new StoredTabletFile(metadataEntry);
   }
 
+  public static StoredTabletFile of(final URI path, Range range) {
+    return of(new Path(Objects.requireNonNull(path)), range);
+  }
+
+  public static StoredTabletFile of(final Path path, Range range) {
+    return new StoredTabletFile(new TabletFileCq(Objects.requireNonNull(path), range));
+  }
+
+  private static final Gson gson = ByteArrayToBase64TypeAdapter.createBase64Gson();
+
+  private static TabletFileCq deserialize(String json) {
+    final TabletFileCqMetadataGson metadata =
+        gson.fromJson(Objects.requireNonNull(json), TabletFileCqMetadataGson.class);
+    // Recreate the exact Range that was originally stored in Metadata. Stored ranges are originally
+    // constructed with inclusive/exclusive for the start and end key inclusivity settings.
+    // (Except for Ranges with no start/endkey as then the inclusivity flags do not matter)
+    //
+    // With this particular constructor, when setting the startRowInclusive to true and
+    // endRowInclusive to false, both the start and end row values will be taken as is
+    // and not modified and will recreate the original Range.
+    //
+    // This constructor will always set the resulting inclusivity of the Range to be true for the
+    // start row and false for end row regardless of what the startRowInclusive and endRowInclusive
+    // flags are set to.
+    return new TabletFileCq(new Path(URI.create(metadata.path)),
+        new Range(decodeRow(metadata.startRow), true, decodeRow(metadata.endRow), false));
+  }
+
+  public static String serialize(String path) {
+    return serialize(path, new Range());
+  }
+
+  public static String serialize(String path, Range range) {
+    requireRowRange(range);
+    final TabletFileCqMetadataGson metadata = new TabletFileCqMetadataGson();
+    metadata.path = Objects.requireNonNull(path);
+    metadata.startRow = encodeRow(range.getStartKey());
+    metadata.endRow = encodeRow(range.getEndKey());
+
+    return gson.toJson(metadata);
+  }
+
+  private static String serialize(TabletFileCq tabletFileCq) {
+    return serialize(Objects.requireNonNull(tabletFileCq).path.toString(), tabletFileCq.range);
+  }
+
+  /**
+   * Helper methods to encode and decode rows in a range to/from byte arrays. Null rows will just be
+   * returned as an empty byte array
+   **/
+
+  private static byte[] encodeRow(final Key key) {
+    final Text row = key != null ? key.getRow() : null;
+    if (row != null) {
+      try (DataOutputBuffer buffer = new DataOutputBuffer()) {
+        row.write(buffer);
+        return buffer.getData();
+      } catch (IOException e) {
+        throw new UncheckedIOException(e);
+      }

Review Comment:
   @keith-turner - Thanks for the suggestion, I made the changes with the serialization to shrink how much is written and also the html escaping. I had also noticed the long out put as well a while back and meant to go back and figure out why but I forgot because the last few weeks I've been testing with deserializing back into the Range object so it was human readable (as I noticed elsehwere) .



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1724538317

   I merged in main to get #3760 to fix the tests, I just re-kicked the full IT again.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335179590


##########
core/src/test/java/org/apache/accumulo/core/file/rfile/FencedRFileTest.java:
##########
@@ -0,0 +1,376 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.file.rfile;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+import org.apache.accumulo.core.crypto.CryptoTest;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.core.file.FileSKVIterator;
+import org.apache.accumulo.core.file.rfile.RFile.FencedReader;
+import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
+import org.apache.accumulo.core.iteratorsImpl.system.MultiIterator;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.Text;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+
+public class FencedRFileTest extends AbstractRFileTest {

Review Comment:
   This may be similar to https://github.com/apache/accumulo/issues/3448 that was opened, I went ahead and just linked that and added it to the list i #3766 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3761: No-chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335045312


##########
test/src/main/java/org/apache/accumulo/test/util/FileMetadataUtil.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.test.util;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Map.Entry;
+
+import org.apache.accumulo.core.data.TableId;
+import org.apache.accumulo.core.metadata.StoredTabletFile;
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.core.metadata.schema.TabletMetadata;
+import org.apache.accumulo.core.metadata.schema.TabletMetadata.ColumnType;
+import org.apache.accumulo.core.metadata.schema.TabletsMetadata;
+import org.apache.accumulo.server.ServerContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class FileMetadataUtil {
+
+  private static final Logger log = LoggerFactory.getLogger(FileMetadataUtil.class);
+
+  public static Map<StoredTabletFile,DataFileValue>
+      printAndVerifyFileMetadata(final ServerContext ctx, TableId tableId) {
+    return printAndVerifyFileMetadata(ctx, tableId, -1);
+  }
+
+  public static Map<StoredTabletFile,DataFileValue>
+      printAndVerifyFileMetadata(final ServerContext ctx, TableId tableId, int expectedFiles) {
+    final Map<StoredTabletFile,DataFileValue> files = new HashMap<>();
+
+    try (TabletsMetadata tabletsMetadata = ctx.getAmple().readTablets().forTable(tableId)
+        .fetch(ColumnType.FILES, ColumnType.PREV_ROW).build()) {
+
+      // Read each file referenced by that table
+      int i = 0;
+      for (TabletMetadata tabletMetadata : tabletsMetadata) {
+        for (Entry<StoredTabletFile,DataFileValue> fileEntry : tabletMetadata.getFilesMap()
+            .entrySet()) {
+          StoredTabletFile file = fileEntry.getKey();
+          DataFileValue dfv = fileEntry.getValue();
+          files.put(file, dfv);
+          log.debug("Extent: " + tabletMetadata.getExtent() + "; File Name: " + file.getFileName()
+              + "; Range: " + file.getRange() + "; Entries: " + dfv.getNumEntries() + ", Size: "
+              + dfv.getSize());
+          i++;
+        }
+      }
+
+      log.debug("");
+      if (expectedFiles >= 0) {

Review Comment:
   > I was thinking there might be a use case where you'd want to verify the expected files seen was in fact 0
   
   Hah, that is what I wanted to achieve too and the code already did that.  I think I made the right call to stop reviewing this yesterday even though I had not gone through all the files.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335013088


##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -649,16 +648,6 @@ TabletGoalState getGoalState(TabletLocationState tls, MergeInfo mergeInfo) {
               break;
             case STARTED:
             case SPLITTING:

Review Comment:
   Yeah, this is actually already removed in my branch for #3769 which is about ready to be submitted as a PR as soon as this PR is merged. [Here](https://github.com/cshannon/accumulo/commit/54f03a443e55a35cb3bd951ab322fb027a90697a#diff-07984c23da1745b22dd4fde5c4b8877dfb190ea19e5e60ab79d8a93e752031c7L650) is the current line in the work in progress branch that would remove it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3761: No-chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335044150


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());

Review Comment:
   > another option I thought of was just caching using Guava memoize which we've done elsewhere. 
   
   I like that solution, multiple benefits w/o any downside.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335038213


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());

Review Comment:
   @keith-turner - another option I thought of was just caching using Guava memoize which we've done elsewhere. I created a separate branch/commit as a quick example of how this would look and wanted to see what you thought. 
   
   Ie: `this.fencedEndKey = Suppliers.memoize(() -> getEndKey(fence.getEndKey()));`
   
   Full example: https://github.com/cshannon/accumulo/commit/0d6088682fe13ecdd89c05a1b0690de0e19bf6da
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1725233860

   > I merged in main to get #3760 to fix the tests, I just re-kicked the full IT again.
   
   Full IT build finished and passed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335011876


##########
test/src/main/java/org/apache/accumulo/test/util/FileMetadataUtil.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.test.util;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Map.Entry;
+
+import org.apache.accumulo.core.data.TableId;
+import org.apache.accumulo.core.metadata.StoredTabletFile;
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.core.metadata.schema.TabletMetadata;
+import org.apache.accumulo.core.metadata.schema.TabletMetadata.ColumnType;
+import org.apache.accumulo.core.metadata.schema.TabletsMetadata;
+import org.apache.accumulo.server.ServerContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class FileMetadataUtil {
+
+  private static final Logger log = LoggerFactory.getLogger(FileMetadataUtil.class);
+
+  public static Map<StoredTabletFile,DataFileValue>
+      printAndVerifyFileMetadata(final ServerContext ctx, TableId tableId) {
+    return printAndVerifyFileMetadata(ctx, tableId, -1);
+  }
+
+  public static Map<StoredTabletFile,DataFileValue>
+      printAndVerifyFileMetadata(final ServerContext ctx, TableId tableId, int expectedFiles) {
+    final Map<StoredTabletFile,DataFileValue> files = new HashMap<>();
+
+    try (TabletsMetadata tabletsMetadata = ctx.getAmple().readTablets().forTable(tableId)
+        .fetch(ColumnType.FILES, ColumnType.PREV_ROW).build()) {
+
+      // Read each file referenced by that table
+      int i = 0;
+      for (TabletMetadata tabletMetadata : tabletsMetadata) {
+        for (Entry<StoredTabletFile,DataFileValue> fileEntry : tabletMetadata.getFilesMap()
+            .entrySet()) {
+          StoredTabletFile file = fileEntry.getKey();
+          DataFileValue dfv = fileEntry.getValue();
+          files.put(file, dfv);
+          log.debug("Extent: " + tabletMetadata.getExtent() + "; File Name: " + file.getFileName()
+              + "; Range: " + file.getRange() + "; Entries: " + dfv.getNumEntries() + ", Size: "
+              + dfv.getSize());
+          i++;
+        }
+      }
+
+      log.debug("");
+      if (expectedFiles >= 0) {

Review Comment:
   Yeah, -1 won't do the check and just prints. I was just trying to provide a quick way to print what's in metadata to see what is there to debug the test if not trying to actually count the files. The reason why this check is currently `>= 0` was I was thinking there might be a use case where you'd want to verify the expected files seen was in fact 0 (empty tablet) at some point in the future so if we made this change it wouldn't check that case of expected being 0 files. So I am thinking we should keep the current check as is so that case would work.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3761: No-chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335043859


##########
core/src/main/java/org/apache/accumulo/core/gc/ReferenceFile.java:
##########
@@ -49,16 +67,16 @@ public TableId getTableId() {
   }
 
   @Override
-  public String getMetadataEntry() {
-    return metadataEntry;
+  public String getMetadataPath() {

Review Comment:
   > I don't think it actually needs to be the exact path since we are not using it to remove entries in the metadata table and where the path is used later by GarbageCollectionAlgorithm it is actually processed and made relative anyways [here](https://github.com/cshannon/accumulo/blob/ecf7c7ac31379b5e5fc95a33df4237927edd8a9c/server/gc/src/main/java/org/apache/accumulo/gc/GarbageCollectionAlgorithm.java#L175)
   
   I agree with that and saw the same thing when I looked.  Based on that it does not seem we need the exact bytes that are stored in the metadata table. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1732575410

   I just kicked off one more full IT and assuming that passes I am planning to merge this today. There are several follow on issues to complete and they can't be submitted until this is merged. 3.1 is not going to be released for a while so there is plenty of time to put in follow on issues and PRs to address any issues with no-chop merge that comes up during testing or later before release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1724028401

   I restarted the full IT build with this new PR


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3761: No chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1333594404


##########
core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java:
##########
@@ -118,11 +148,115 @@ public String toString() {
    * Validates that the provided metadata string for the StoredTabletFile is valid.
    */
   public static void validate(String metadataEntry) {
-    ReferencedTabletFile.parsePath(new Path(URI.create(metadataEntry)));
+    final TabletFileCq tabletFileCq = deserialize(metadataEntry);
+    // Validate the path
+    ReferencedTabletFile.parsePath(deserialize(metadataEntry).path);
+    // Validate the range
+    requireRowRange(tabletFileCq.range);
+  }
+
+  public static StoredTabletFile of(final Text metadataEntry) {
+    return new StoredTabletFile(Objects.requireNonNull(metadataEntry).toString());
   }
 
   public static StoredTabletFile of(final String metadataEntry) {
     return new StoredTabletFile(metadataEntry);
   }
 
+  public static StoredTabletFile of(final URI path, Range range) {
+    return of(new Path(Objects.requireNonNull(path)), range);
+  }
+
+  public static StoredTabletFile of(final Path path, Range range) {
+    return new StoredTabletFile(new TabletFileCq(Objects.requireNonNull(path), range));
+  }
+
+  private static final Gson gson = ByteArrayToBase64TypeAdapter.createBase64Gson();
+
+  private static TabletFileCq deserialize(String json) {
+    final TabletFileCqMetadataGson metadata =
+        gson.fromJson(Objects.requireNonNull(json), TabletFileCqMetadataGson.class);
+    // Recreate the exact Range that was originally stored in Metadata. Stored ranges are originally
+    // constructed with inclusive/exclusive for the start and end key inclusivity settings.
+    // (Except for Ranges with no start/endkey as then the inclusivity flags do not matter)
+    //
+    // With this particular constructor, when setting the startRowInclusive to true and
+    // endRowInclusive to false, both the start and end row values will be taken as is
+    // and not modified and will recreate the original Range.
+    //
+    // This constructor will always set the resulting inclusivity of the Range to be true for the
+    // start row and false for end row regardless of what the startRowInclusive and endRowInclusive
+    // flags are set to.
+    return new TabletFileCq(new Path(URI.create(metadata.path)),
+        new Range(decodeRow(metadata.startRow), true, decodeRow(metadata.endRow), false));
+  }
+
+  public static String serialize(String path) {
+    return serialize(path, new Range());
+  }
+
+  public static String serialize(String path, Range range) {
+    requireRowRange(range);
+    final TabletFileCqMetadataGson metadata = new TabletFileCqMetadataGson();
+    metadata.path = Objects.requireNonNull(path);
+    metadata.startRow = encodeRow(range.getStartKey());
+    metadata.endRow = encodeRow(range.getEndKey());
+
+    return gson.toJson(metadata);
+  }
+
+  private static String serialize(TabletFileCq tabletFileCq) {
+    return serialize(Objects.requireNonNull(tabletFileCq).path.toString(), tabletFileCq.range);
+  }
+
+  /**
+   * Helper methods to encode and decode rows in a range to/from byte arrays. Null rows will just be
+   * returned as an empty byte array
+   **/
+
+  private static byte[] encodeRow(final Key key) {
+    final Text row = key != null ? key.getRow() : null;
+    if (row != null) {
+      try (DataOutputBuffer buffer = new DataOutputBuffer()) {
+        row.write(buffer);
+        return buffer.getData();
+      } catch (IOException e) {
+        throw new UncheckedIOException(e);
+      }

Review Comment:
   This PR does not change the class I modified to avoid escaping characters in the base64.  Modified the following class so that common base64 chars are not escaped in the json making it hard to copy them to the command line and run `base64 -d | xxd` on them.
   
   ```diff
   diff --git a/core/src/main/java/org/apache/accumulo/core/util/json/ByteArrayToBase64TypeAdapter.java b/core/src/main/java/org/apache/accumulo/core/util/json/ByteArrayToBase64TypeAdapter.java
   index 86a4541360..79cc4ff661 100644
   --- a/core/src/main/java/org/apache/accumulo/core/util/json/ByteArrayToBase64TypeAdapter.java
   +++ b/core/src/main/java/org/apache/accumulo/core/util/json/ByteArrayToBase64TypeAdapter.java
   @@ -72,7 +72,7 @@ public class ByteArrayToBase64TypeAdapter
       * @return GsonBuilder
       */
      public static GsonBuilder registerBase64TypeAdapter(final GsonBuilder gsonBuilder) {
   -    return Objects.requireNonNull(gsonBuilder).registerTypeHierarchyAdapter(byte[].class,
   +    return Objects.requireNonNull(gsonBuilder).disableHtmlEscaping().registerTypeHierarchyAdapter(byte[].class,
            new ByteArrayToBase64TypeAdapter());
      }
    }
   
   ```



##########
core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java:
##########
@@ -118,11 +148,115 @@ public String toString() {
    * Validates that the provided metadata string for the StoredTabletFile is valid.
    */
   public static void validate(String metadataEntry) {
-    ReferencedTabletFile.parsePath(new Path(URI.create(metadataEntry)));
+    final TabletFileCq tabletFileCq = deserialize(metadataEntry);
+    // Validate the path
+    ReferencedTabletFile.parsePath(deserialize(metadataEntry).path);
+    // Validate the range
+    requireRowRange(tabletFileCq.range);
+  }
+
+  public static StoredTabletFile of(final Text metadataEntry) {
+    return new StoredTabletFile(Objects.requireNonNull(metadataEntry).toString());
   }
 
   public static StoredTabletFile of(final String metadataEntry) {
     return new StoredTabletFile(metadataEntry);
   }
 
+  public static StoredTabletFile of(final URI path, Range range) {
+    return of(new Path(Objects.requireNonNull(path)), range);
+  }
+
+  public static StoredTabletFile of(final Path path, Range range) {
+    return new StoredTabletFile(new TabletFileCq(Objects.requireNonNull(path), range));
+  }
+
+  private static final Gson gson = ByteArrayToBase64TypeAdapter.createBase64Gson();
+
+  private static TabletFileCq deserialize(String json) {
+    final TabletFileCqMetadataGson metadata =
+        gson.fromJson(Objects.requireNonNull(json), TabletFileCqMetadataGson.class);
+    // Recreate the exact Range that was originally stored in Metadata. Stored ranges are originally
+    // constructed with inclusive/exclusive for the start and end key inclusivity settings.
+    // (Except for Ranges with no start/endkey as then the inclusivity flags do not matter)
+    //
+    // With this particular constructor, when setting the startRowInclusive to true and
+    // endRowInclusive to false, both the start and end row values will be taken as is
+    // and not modified and will recreate the original Range.
+    //
+    // This constructor will always set the resulting inclusivity of the Range to be true for the
+    // start row and false for end row regardless of what the startRowInclusive and endRowInclusive
+    // flags are set to.
+    return new TabletFileCq(new Path(URI.create(metadata.path)),
+        new Range(decodeRow(metadata.startRow), true, decodeRow(metadata.endRow), false));
+  }
+
+  public static String serialize(String path) {
+    return serialize(path, new Range());
+  }
+
+  public static String serialize(String path, Range range) {
+    requireRowRange(range);
+    final TabletFileCqMetadataGson metadata = new TabletFileCqMetadataGson();
+    metadata.path = Objects.requireNonNull(path);
+    metadata.startRow = encodeRow(range.getStartKey());
+    metadata.endRow = encodeRow(range.getEndKey());
+
+    return gson.toJson(metadata);
+  }
+
+  private static String serialize(TabletFileCq tabletFileCq) {
+    return serialize(Objects.requireNonNull(tabletFileCq).path.toString(), tabletFileCq.range);
+  }
+
+  /**
+   * Helper methods to encode and decode rows in a range to/from byte arrays. Null rows will just be
+   * returned as an empty byte array
+   **/
+
+  private static byte[] encodeRow(final Key key) {
+    final Text row = key != null ? key.getRow() : null;
+    if (row != null) {
+      try (DataOutputBuffer buffer = new DataOutputBuffer()) {
+        row.write(buffer);
+        return buffer.getData();
+      } catch (IOException e) {
+        throw new UncheckedIOException(e);
+      }

Review Comment:
   I was experimenting with this and deleted a range with single character rows from a tablet.  When I looked at the metadata table I noticed it had really long encoded rows like the following.
   
   ```
   1< file:{"path":"hdfs://10.173.114.210:8020/accumulo/tables/1/default_tablet/F0000001.rf","startRow":"","endRow":"AjMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA\\u003d"} []	143,5
   ```
   
   I looked into this an found `buffer.getData()` returns the entire array it was using for a buffer even if not all of that array was used.  That is ok because the Text object write a length, but it makes the base 64 longer than it needs to be.   I changed the code to the following which creates a byte array that is the exact size of what was written.
   
   ```suggestion
         try(ByteArrayOutputStream baos = new ByteArrayOutputStream();
         DataOutputStream dos = new DataOutputStream(baos)) {
           row.write(dos);
           dos.close();
           return baos.toByteArray();
         }catch (IOException e) {
           throw new UncheckedIOException(e);
         }
   ```
   
   With the above changes the encoded looks like the following.
   
   ```
   1< file:{"path":"hdfs://10.173.114.210:8020/accumulo/tables/1/default_tablet/F0000002.rf","startRow":"","endRow":"AjMA"} []	145,5
   ```
   
   I also found a way to prevent the `\\u003d` in the json, I will comment on that elsewhere.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335020034


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());

Review Comment:
   That's a good point, do you think it makes sense to lazy cache the value after computation for repeat calls to `getLastKey()` on the same reader instance?
   
   I haven't looked in detail but after a quick glance it appears that in the split code where `getLastKey()` is used it will just be called once but I'm wondering if there could be a use case in the future (or current that I missed) where the method could be called multiple times on the same reader so it would be worthwhile to avoid having to compute and copy the bytes each time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335023188


##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFile.java:
##########
@@ -76,6 +78,38 @@ public interface InputArguments {
      * @return this
      */
     ScannerFSOptions from(String... files);
+
+    /**
+     * Specify FencedPath files to read from. When multiple are specified the {@link Scanner}
+     * constructed will present a merged view.
+     *
+     * @param files one or more FencedPaths to read.
+     * @return this
+     *
+     * @since 3.1.0
+     */
+    ScannerFSOptions from(FencedPath... files);
+
+    /**
+     * @since 3.1.0
+     */
+    class FencedPath {
+      private final Path path;
+      private final Range fence;
+
+      public FencedPath(Path path, Range fence) {
+        this.path = Objects.requireNonNull(path);
+        this.fence = Objects.requireNonNull(fence);

Review Comment:
   I added this and added tests in https://github.com/apache/accumulo/pull/3761/commits/706c26702495373f0fa61a5ab6e1aba7b117bd0e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon merged pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon merged PR #3761:
URL: https://github.com/apache/accumulo/pull/3761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335010810


##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/ExternalCompactionMetadata.java:
##########
@@ -146,7 +144,7 @@ public static ExternalCompactionMetadata fromJson(String json) {
     return new ExternalCompactionMetadata(
         jData.inputs.stream().map(StoredTabletFile::new).collect(toSet()),
         jData.nextFiles.stream().map(StoredTabletFile::new).collect(toSet()),
-        new ReferencedTabletFile(new Path(jData.tmp)), jData.compactor,
+        StoredTabletFile.of(jData.tmp).getTabletFile(), jData.compactor,

Review Comment:
   I create #3768 to handle upgrades and I created a task list to track all of the different things that need upgrading. I added this to the task list so we don't forget.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3761: No-chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1334812101


##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFile.java:
##########
@@ -76,6 +78,38 @@ public interface InputArguments {
      * @return this
      */
     ScannerFSOptions from(String... files);
+
+    /**
+     * Specify FencedPath files to read from. When multiple are specified the {@link Scanner}
+     * constructed will present a merged view.
+     *
+     * @param files one or more FencedPaths to read.
+     * @return this
+     *
+     * @since 3.1.0
+     */
+    ScannerFSOptions from(FencedPath... files);
+
+    /**
+     * @since 3.1.0
+     */
+    class FencedPath {
+      private final Path path;
+      private final Range fence;
+
+      public FencedPath(Path path, Range fence) {
+        this.path = Objects.requireNonNull(path);
+        this.fence = Objects.requireNonNull(fence);

Review Comment:
   Would be good to make this method a bit more strict with the ranges it accepts.  At this point a lot of the code expects row ranges, so it would be good to fail early if its not a row range.  Could add to the java doc for the constructor that only row ranges are accepted.
   
   ```suggestion
           this.fence = AbstractTabletFile.requireRowRange(Objects.requireNonNull(fence));
   ```
   
   For the above change need to make the method in AbstractTabletFile public.
   
   Could add a unit test that passed in a non row range and assert an exception is thrown.



##########
test/src/main/java/org/apache/accumulo/test/functional/MergeIT.java:
##########
@@ -102,6 +124,306 @@ public void mergeSize() throws Exception {
     }
   }
 
+  @Test
+  public void noChopMergeTest() throws Exception {

Review Comment:
   The test that were added are nice and the comments make them easier to understand.



##########
core/src/main/java/org/apache/accumulo/core/gc/ReferenceFile.java:
##########
@@ -49,16 +67,16 @@ public TableId getTableId() {
   }
 
   @Override
-  public String getMetadataEntry() {
-    return metadataEntry;
+  public String getMetadataPath() {

Review Comment:
   Looking at the javadoc for [Reference.getMetadataPath()](https://github.com/apache/accumulo/blob/f501dca59ae47179c6f62a134e44b27a3d34f652/core/src/main/java/org/apache/accumulo/core/gc/Reference.java#L39-L45) it writes about this data being exactly what is stored in the metadata table.  That is not technically true, what it stored in the metadata table is the json.  However I don't think it needs to be exactly what is stored in the metadata table.  Need to figure out if there is need for the exact metadata info or if the the javadoc needs to be updated.   I suspect the javadoc needs to be updated, but not completely sure.  Could open a follow on issue to investigate this further.



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fencedStartKey;
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fencedEndKey;
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+
+    private Key getEndKey(Key key) {
+      // If they key is infinite it will be null or if inclusive we can just use it as is
+      // as it would be the correct value for getLastKey()
+      if (fence.isInfiniteStopKey() || fence.isEndKeyInclusive()) {
+        return key;
+      }
+
+      // If exclusive we need to strip the last byte to get the last key that is part of the
+      // actual range to return
+      final byte[] ba = key.getRow().getBytes();
+      Preconditions.checkArgument(ba.length > 0 && ba[ba.length - 1] == (byte) 0x00);
+      byte[] fba = new byte[ba.length - 1];
+      System.arraycopy(ba, 0, fba, 0, ba.length - 1);
+
+      return new Key(fba);
+    }
+
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
+      throw new UnsupportedOperationException();
+    }
+  }
+
+  static class FencedReader extends FencedFileSKVIterator implements RFileSKVIterator {
+
+    private final Reader reader;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      super(reader, seekFence);
+      this.reader = reader;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.reset();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public FileSKVIterator getIndex() throws IOException {
+      return new FencedIndex(reader.getIndex(), fence);
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      final Reader sample = reader.getSample(sampleConfig);
+      return sample != null ? new FencedReader(sample, fence) : null;
+    }
+
+    @Override
+    public void reset() {
+      reader.reset();
+    }
+  }
+
+  public static RFileSKVIterator getReader(final CachableBuilder cb, final TabletFile dataFile)
+      throws IOException {
+    final RFile.Reader reader = new RFile.Reader(Objects.requireNonNull(cb));
+    return dataFile.hasRange() ? new FencedReader(reader, dataFile.getRange()) : reader;
+  }
+
+  public static RFileSKVIterator getReader(final CachableBuilder cb, Range range)
+      throws IOException {
+    final RFile.Reader reader = new RFile.Reader(Objects.requireNonNull(cb));
+    return !Objects.requireNonNull(range).isInfiniteStartKey() || !range.isInfiniteStopKey()

Review Comment:
   Calling the method on range will throw an NPE if its null, so the requireNonNull check is redundant. 
   
   ```suggestion
       return !range.isInfiniteStartKey() || !range.isInfiniteStopKey()
   ```



##########
test/src/main/java/org/apache/accumulo/test/util/FileMetadataUtil.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.test.util;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import java.util.HashMap;
+import java.util.Map;
+import java.util.Map.Entry;
+
+import org.apache.accumulo.core.data.TableId;
+import org.apache.accumulo.core.metadata.StoredTabletFile;
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.core.metadata.schema.TabletMetadata;
+import org.apache.accumulo.core.metadata.schema.TabletMetadata.ColumnType;
+import org.apache.accumulo.core.metadata.schema.TabletsMetadata;
+import org.apache.accumulo.server.ServerContext;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class FileMetadataUtil {
+
+  private static final Logger log = LoggerFactory.getLogger(FileMetadataUtil.class);
+
+  public static Map<StoredTabletFile,DataFileValue>
+      printAndVerifyFileMetadata(final ServerContext ctx, TableId tableId) {
+    return printAndVerifyFileMetadata(ctx, tableId, -1);
+  }
+
+  public static Map<StoredTabletFile,DataFileValue>
+      printAndVerifyFileMetadata(final ServerContext ctx, TableId tableId, int expectedFiles) {
+    final Map<StoredTabletFile,DataFileValue> files = new HashMap<>();
+
+    try (TabletsMetadata tabletsMetadata = ctx.getAmple().readTablets().forTable(tableId)
+        .fetch(ColumnType.FILES, ColumnType.PREV_ROW).build()) {
+
+      // Read each file referenced by that table
+      int i = 0;
+      for (TabletMetadata tabletMetadata : tabletsMetadata) {
+        for (Entry<StoredTabletFile,DataFileValue> fileEntry : tabletMetadata.getFilesMap()
+            .entrySet()) {
+          StoredTabletFile file = fileEntry.getKey();
+          DataFileValue dfv = fileEntry.getValue();
+          files.put(file, dfv);
+          log.debug("Extent: " + tabletMetadata.getExtent() + "; File Name: " + file.getFileName()
+              + "; Range: " + file.getRange() + "; Entries: " + dfv.getNumEntries() + ", Size: "
+              + dfv.getSize());
+          i++;
+        }
+      }
+
+      log.debug("");
+      if (expectedFiles >= 0) {

Review Comment:
   Seems when this -1 it just prints.  Its not called w/ zero currently but could be in the future.
   
   ```suggestion
         if (expectedFiles > 0) {
   ```



##########
server/manager/src/main/java/org/apache/accumulo/manager/Manager.java:
##########
@@ -649,16 +648,6 @@ TabletGoalState getGoalState(TabletLocationState tls, MergeInfo mergeInfo) {
               break;
             case STARTED:
             case SPLITTING:

Review Comment:
   Is this SPLITTING enum something that can be removed in a follow on issue?



##########
server/manager/src/test/java/org/apache/accumulo/manager/TabletGroupWatcherTest.java:
##########
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.manager;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+
+import org.apache.accumulo.core.metadata.schema.DataFileValue;
+import org.apache.accumulo.core.util.Pair;
+import org.junit.jupiter.api.Test;
+
+public class TabletGroupWatcherTest {
+
+  @Test
+  public void testComputeNewDfvEven() {
+    DataFileValue original = new DataFileValue(20, 10, 100);
+    Pair<DataFileValue,DataFileValue> newValues = TabletGroupWatcher.computeNewDfv(original);
+
+    assertEquals(10, newValues.getFirst().getSize());
+    assertEquals(5, newValues.getFirst().getNumEntries());
+    assertEquals(original.getTime(), newValues.getFirst().getTime());
+    assertEquals(10, newValues.getSecond().getSize());
+    assertEquals(5, newValues.getSecond().getNumEntries());
+    assertEquals(original.getTime(), newValues.getSecond().getTime());
+  }
+
+  @Test
+  public void testComputeNewDfvOdd() {
+    DataFileValue original = new DataFileValue(21, 11, 100);
+    Pair<DataFileValue,DataFileValue> newValues = TabletGroupWatcher.computeNewDfv(original);
+
+    assertEquals(10, newValues.getFirst().getSize());
+    assertEquals(5, newValues.getFirst().getNumEntries());
+    assertEquals(original.getTime(), newValues.getFirst().getTime());
+    assertEquals(11, newValues.getSecond().getSize());
+    assertEquals(6, newValues.getSecond().getNumEntries());
+    assertEquals(original.getTime(), newValues.getSecond().getTime());
+  }
+

Review Comment:
   ```suggestion
     @Test
     public void testComputeNewDfvSmall() {
       DataFileValue original = new DataFileValue(1, 2, 100);
       Pair<DataFileValue,DataFileValue> newValues = TabletGroupWatcher.computeNewDfv(original);
   
       assertEquals(1, newValues.getFirst().getSize());
       assertEquals(1, newValues.getFirst().getNumEntries());
       assertEquals(original.getTime(), newValues.getFirst().getTime());
       assertEquals(1, newValues.getSecond().getSize());
       assertEquals(1, newValues.getSecond().getNumEntries());
       assertEquals(original.getTime(), newValues.getSecond().getTime());
     }
   ```



##########
core/src/main/java/org/apache/accumulo/core/metadata/schema/ExternalCompactionMetadata.java:
##########
@@ -146,7 +144,7 @@ public static ExternalCompactionMetadata fromJson(String json) {
     return new ExternalCompactionMetadata(
         jData.inputs.stream().map(StoredTabletFile::new).collect(toSet()),
         jData.nextFiles.stream().map(StoredTabletFile::new).collect(toSet()),
-        new ReferencedTabletFile(new Path(jData.tmp)), jData.compactor,
+        StoredTabletFile.of(jData.tmp).getTabletFile(), jData.compactor,

Review Comment:
   If there is external compaction metadata from an upgrade it may blow up here. Need a follow on issue to delete external compaction entries on upgrade.



##########
server/base/src/test/java/org/apache/accumulo/server/constraints/MetadataConstraintsTest.java:
##########
@@ -154,9 +157,15 @@ public void testBulkFileCheck() {
 
     // inactive txid
     m = new Mutation(new Text("0;foo"));
-    m.put(BulkFileColumnFamily.NAME, new Text("hdfs://nn1/a/accumulo/tables/2b/t-001/someFile"),
+    m.put(BulkFileColumnFamily.NAME,

Review Comment:
   As a follow on could add test for the following for metadata constraint
   
    * old style file insert
    * insert with incorrect json (like the startRow is missing)



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());

Review Comment:
   I would suggest moving this computation of the fencedEndKEy to the getLastKey() method.  Reason is that not every scan will need this.  Only things like splitting a tablet will use it.  So I suspect its being frequently computed but not frequently used.  Maybe the fencedEndKey instance var could be dropped.
   
   ```suggestion
   ```



##########
server/tserver/src/main/java/org/apache/accumulo/tserver/compactions/ExternalCompactionJob.java:
##########
@@ -76,7 +76,7 @@ public TExternalCompactionJob toThrift() {
 
     List<InputFile> files = jobFiles.entrySet().stream().map(e -> {
       var dfv = e.getValue();
-      return new InputFile(e.getKey().getNormalizedPathStr(), dfv.getSize(), dfv.getNumEntries(),
+      return new InputFile(e.getKey().getMetadata(), dfv.getSize(), dfv.getNumEntries(),

Review Comment:
   As a follow on it would be good to have an IT that forces an external compaction on a tablet that has files with ranges.  Construct the situation such that if the external compaction does not honor the ranges it would end up creating a file with more data than it should have.



##########
server/base/src/main/java/org/apache/accumulo/server/fs/VolumeUtil.java:
##########
@@ -53,29 +54,29 @@ public static String removeTrailingSlash(String path) {
   }
 
   public static Path removeTrailingSlash(Path path) {
-    String pathStr = path.toString();
+    String pathStr = Objects.requireNonNull(path).toString();

Review Comment:
   There are volume replacement ITs may be worthwhile having those exercise replacing a volume where there are tablets that are using that volume with non infinite ranges.  This could be another follow in testing issue.



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fencedStartKey;
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fencedEndKey;
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+
+    private Key getEndKey(Key key) {
+      // If they key is infinite it will be null or if inclusive we can just use it as is
+      // as it would be the correct value for getLastKey()
+      if (fence.isInfiniteStopKey() || fence.isEndKeyInclusive()) {
+        return key;
+      }
+
+      // If exclusive we need to strip the last byte to get the last key that is part of the
+      // actual range to return
+      final byte[] ba = key.getRow().getBytes();
+      Preconditions.checkArgument(ba.length > 0 && ba[ba.length - 1] == (byte) 0x00);
+      byte[] fba = new byte[ba.length - 1];
+      System.arraycopy(ba, 0, fba, 0, ba.length - 1);
+
+      return new Key(fba);
+    }
+
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
+      throw new UnsupportedOperationException();
+    }
+  }
+
+  static class FencedReader extends FencedFileSKVIterator implements RFileSKVIterator {
+
+    private final Reader reader;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      super(reader, seekFence);
+      this.reader = reader;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.reset();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public FileSKVIterator getIndex() throws IOException {
+      return new FencedIndex(reader.getIndex(), fence);
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      final Reader sample = reader.getSample(sampleConfig);
+      return sample != null ? new FencedReader(sample, fence) : null;

Review Comment:
   As a follow on could have test of sampling that covers case of files with non infinite ranges.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335038213


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());

Review Comment:
   @keith-turner - another option I thought of was just caching using Guava memoize which we've done elsewhere. I created a separate branch/commit as a quick example of how this would look and wanted to see what you thought: https://github.com/cshannon/accumulo/commit/d871e466ac90a02dfd78fdbda90ca4a1b0d8cfa7



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3761: No-chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1732370275

   > I realized a compaction running could throw this off so I am wondering if I need to update and make sure the ITs won't ever run a compaction or if it's not necessary as there's not enough data being written. Probably wouldn't hurt to disable just so things are deterministic.
   
   Yeah it is nice to avoid non deterministic test failures.  Could up the compaction ratio on the table to higher than the number of files created to avoid this.   If creating less files than the default for the compaction ratio it should not be a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3761: No-chop merge and delete

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335055861


##########
core/src/test/java/org/apache/accumulo/core/file/rfile/FencedRFileTest.java:
##########
@@ -0,0 +1,376 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   https://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.accumulo.core.file.rfile;
+
+import static org.junit.jupiter.api.Assertions.assertEquals;
+import static org.junit.jupiter.api.Assertions.assertFalse;
+import static org.junit.jupiter.api.Assertions.assertTrue;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Optional;
+
+import org.apache.accumulo.core.crypto.CryptoTest;
+import org.apache.accumulo.core.data.Key;
+import org.apache.accumulo.core.data.Range;
+import org.apache.accumulo.core.data.Value;
+import org.apache.accumulo.core.file.FileSKVIterator;
+import org.apache.accumulo.core.file.rfile.RFile.FencedReader;
+import org.apache.accumulo.core.iterators.SortedKeyValueIterator;
+import org.apache.accumulo.core.iteratorsImpl.system.MultiIterator;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.io.Text;
+import org.junit.jupiter.api.BeforeAll;
+import org.junit.jupiter.api.Test;
+
+public class FencedRFileTest extends AbstractRFileTest {

Review Comment:
   Ran these unit test w/ code coverage and noticed some of the new code was not covered.  Added an item to the list on #3766 about this.



##########
core/src/main/java/org/apache/accumulo/core/metadata/ReferencedTabletFile.java:
##########
@@ -186,48 +195,35 @@ public String getNormalizedPathStr() {
     return parts.getNormalizedPath();
   }
 
-  /**
-   * Return a string for inserting a new tablet file.
-   */
-  public String getMetaInsert() {
-    return parts.getNormalizedPath();
-  }
-
-  /**
-   * Return a new Text object of {@link #getMetaInsert()}
-   */
-  public Text getMetaInsertText() {
-    return new Text(getMetaInsert());
-  }
-
   /**
    * New file was written to metadata so return a StoredTabletFile
    */
   public StoredTabletFile insert() {
-    return new StoredTabletFile(parts.getNormalizedPath());
+    return StoredTabletFile.of(getPath(), getRange());
   }
 
   @Override
   public int compareTo(ReferencedTabletFile o) {
     if (equals(o)) {
       return 0;
     } else {
-      return parts.getNormalizedPath().compareTo(o.parts.getNormalizedPath());
+      return comparator.compare(this, o);
     }

Review Comment:
   This was an existing pattern in the code, may be better to skip the equals call.  In the not equals case it will read the data twice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335038213


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1556,5 +1561,230 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+    private final Key fencedStartKey;
+    private final Key fencedEndKey;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = Objects.requireNonNull(fence);
+      this.fencedStartKey = fence.getStartKey();
+      this.fencedEndKey = getEndKey(fence.getEndKey());

Review Comment:
   @keith-turner - another option I thought of was just caching using Guava memoize which we've done elsewhere. I created a separate [commit](https://github.com/cshannon/accumulo/commit/0d6088682fe13ecdd89c05a1b0690de0e19bf6da) as a quick example of how this would look and wanted to see what you thought. 
   
   Ie: `this.fencedEndKey = Suppliers.memoize(() -> getEndKey(fence.getEndKey()));`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1335180113


##########
core/src/main/java/org/apache/accumulo/core/metadata/ReferencedTabletFile.java:
##########
@@ -186,48 +195,35 @@ public String getNormalizedPathStr() {
     return parts.getNormalizedPath();
   }
 
-  /**
-   * Return a string for inserting a new tablet file.
-   */
-  public String getMetaInsert() {
-    return parts.getNormalizedPath();
-  }
-
-  /**
-   * Return a new Text object of {@link #getMetaInsert()}
-   */
-  public Text getMetaInsertText() {
-    return new Text(getMetaInsert());
-  }
-
   /**
    * New file was written to metadata so return a StoredTabletFile
    */
   public StoredTabletFile insert() {
-    return new StoredTabletFile(parts.getNormalizedPath());
+    return StoredTabletFile.of(getPath(), getRange());
   }
 
   @Override
   public int compareTo(ReferencedTabletFile o) {
     if (equals(o)) {
       return 0;
     } else {
-      return parts.getNormalizedPath().compareTo(o.parts.getNormalizedPath());
+      return comparator.compare(this, o);
     }

Review Comment:
   Good point, i can drop the equals for sure, just an oversight, good catch. It should just be:
   
   ```java
     @Override
     public int compareTo(ReferencedTabletFile o) {
       return comparator.compare(this, o);
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1732583551

   > > I realized a compaction running could throw this off so I am wondering if I need to update and make sure the ITs won't ever run a compaction or if it's not necessary as there's not enough data being written. Probably wouldn't hurt to disable just so things are deterministic.
   > 
   > Yeah it is nice to avoid non deterministic test failures. Could up the compaction ratio on the table to higher than the number of files created to avoid this. If creating less files than the default for the compaction ratio it should not be a problem.
   
   Looks like I did do this in one [test](https://github.com/cshannon/accumulo/blob/3673ff8c297bdccac9bc0f55fb726244e8157d7d/test/src/main/java/org/apache/accumulo/test/functional/MergeIT.java#L201) but need to go back and make sure other tests have it too, will add it to  #3766 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3761: No chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#discussion_r1334276138


##########
core/src/main/java/org/apache/accumulo/core/metadata/StoredTabletFile.java:
##########
@@ -118,11 +148,115 @@ public String toString() {
    * Validates that the provided metadata string for the StoredTabletFile is valid.
    */
   public static void validate(String metadataEntry) {
-    ReferencedTabletFile.parsePath(new Path(URI.create(metadataEntry)));
+    final TabletFileCq tabletFileCq = deserialize(metadataEntry);
+    // Validate the path
+    ReferencedTabletFile.parsePath(deserialize(metadataEntry).path);
+    // Validate the range
+    requireRowRange(tabletFileCq.range);
+  }
+
+  public static StoredTabletFile of(final Text metadataEntry) {
+    return new StoredTabletFile(Objects.requireNonNull(metadataEntry).toString());
   }
 
   public static StoredTabletFile of(final String metadataEntry) {
     return new StoredTabletFile(metadataEntry);
   }
 
+  public static StoredTabletFile of(final URI path, Range range) {
+    return of(new Path(Objects.requireNonNull(path)), range);
+  }
+
+  public static StoredTabletFile of(final Path path, Range range) {
+    return new StoredTabletFile(new TabletFileCq(Objects.requireNonNull(path), range));
+  }
+
+  private static final Gson gson = ByteArrayToBase64TypeAdapter.createBase64Gson();
+
+  private static TabletFileCq deserialize(String json) {
+    final TabletFileCqMetadataGson metadata =
+        gson.fromJson(Objects.requireNonNull(json), TabletFileCqMetadataGson.class);
+    // Recreate the exact Range that was originally stored in Metadata. Stored ranges are originally
+    // constructed with inclusive/exclusive for the start and end key inclusivity settings.
+    // (Except for Ranges with no start/endkey as then the inclusivity flags do not matter)
+    //
+    // With this particular constructor, when setting the startRowInclusive to true and
+    // endRowInclusive to false, both the start and end row values will be taken as is
+    // and not modified and will recreate the original Range.
+    //
+    // This constructor will always set the resulting inclusivity of the Range to be true for the
+    // start row and false for end row regardless of what the startRowInclusive and endRowInclusive
+    // flags are set to.
+    return new TabletFileCq(new Path(URI.create(metadata.path)),
+        new Range(decodeRow(metadata.startRow), true, decodeRow(metadata.endRow), false));
+  }
+
+  public static String serialize(String path) {
+    return serialize(path, new Range());
+  }
+
+  public static String serialize(String path, Range range) {
+    requireRowRange(range);
+    final TabletFileCqMetadataGson metadata = new TabletFileCqMetadataGson();
+    metadata.path = Objects.requireNonNull(path);
+    metadata.startRow = encodeRow(range.getStartKey());
+    metadata.endRow = encodeRow(range.getEndKey());
+
+    return gson.toJson(metadata);
+  }
+
+  private static String serialize(TabletFileCq tabletFileCq) {
+    return serialize(Objects.requireNonNull(tabletFileCq).path.toString(), tabletFileCq.range);
+  }
+
+  /**
+   * Helper methods to encode and decode rows in a range to/from byte arrays. Null rows will just be
+   * returned as an empty byte array
+   **/
+
+  private static byte[] encodeRow(final Key key) {
+    final Text row = key != null ? key.getRow() : null;
+    if (row != null) {
+      try (DataOutputBuffer buffer = new DataOutputBuffer()) {
+        row.write(buffer);
+        return buffer.getData();
+      } catch (IOException e) {
+        throw new UncheckedIOException(e);
+      }

Review Comment:
   As a side note I forgot to include co-author tags for @keith-turner and @ctubbsii when I made this new PR and squashed the previous branch. I will try and remember to make sure they are included whenever we squash this PR and merge it into main.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1732345148

   @keith-turner - I added the 4 tests you suggested to the list here https://github.com/apache/accumulo/issues/3766 . I figured the simplest thing to do was just have 1 issue that tracks all the tests to add and create PRs for each one so we don't end up with a ton of separate issues.
   
   Your external compaction test suggestion made me think of something else. I realized that in some of the tests for Merge/Delete I am counting and verifying the metadata files are the expected count to make sure things look ok in the table. I realized a compaction running could throw this off so I am wondering if I need to update and make sure the ITs won't ever run a compaction or if it's not necessary as there's not enough data being written. Probably wouldn't hurt to disable just so things are deterministic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3761: No-chop merge and delete

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3761:
URL: https://github.com/apache/accumulo/pull/3761#issuecomment-1732634925

   > I just kicked off one more full IT and assuming that passes I am planning to merge this today. There are several follow on issues to complete (including a major one of merging into elasticity) and they can't be completed until this is merged. 3.1 is not going to be released for a while so there is plenty of time to put in follow on issues and PRs to address any issues with no-chop merge that comes up during testing or later before release.
   
   The only thing that failed was GarbageCollectorIT due to updates from #3738 which have now been fixed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org