You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "cshannon (via GitHub)" <gi...@apache.org> on 2023/05/20 20:17:12 UTC

[GitHub] [accumulo] cshannon opened a new pull request, #3418: WIP - Add an optional fenced range to Tablet files

cshannon opened a new pull request, #3418:
URL: https://github.com/apache/accumulo/pull/3418

   This allows treating each fenced range of an RFile as a separate TabletFile for reading purposes. This PR is part of #1327 and the latest attempt to add fencing to an RFile. The changes here build off of the changes in #3401 by adding a Range to `AbstractTabletFile`. This allows `RFileOperations` to easily access the Range and wrap the Reader inside a FencedReader.
   
   The idea here is to associate a range/fence with a TabletFile so that we can easily treat the combination of an RFile and fence as a unique file which means less changes to the rest of the code base when we have multiple ranges for a single file as the code just thinks they are unique files. For more information see the comment [here](https://github.com/apache/accumulo/issues/1327#issuecomment-1509174746) and [here](https://github.com/apache/accumulo/issues/1327#issuecomment-1509338370).
   
   So, for example, if we had 5 ranges defined for an RFile we'd load up 5 "files" that were fenced off by each range and the rest of the code would just get a list of 5 readers and wouldn't know that they were actually the same file and wouldn't care when iterating. The 5 fenced files (that are really just subsets of the same file) are treated identical by everywhere else in the code as 5 unique files. 
   
   One thing to note is that inside FileManager we track reserved readers by TabletFile so each unique range for the same file would get its own reader in the cache. This should be fine as we want to treat them as unique and actual file data on disk is still cached by the block cache and won't be duplicated if multiple ranges. We still want to limit the number iterators/scans at one time even if it's the same file. In fact, this isn't new as we already do this. FileManager previously already supported readers for the same file in case there are multiple concurrent reads, this just now also supports another way to have a reference to the same file.
   
   I marked this as a work in progress for now as I wasn't sure how much to do update in this PR vs future PRs. The main purpose of this PR is just to add the fencing iterator but I also updated FileOperations and RFileScanner to use it just to demonstrate it works.
   
   PR includes the following:
   
   1. An iterator to fence off an RFile by range
   2. An iterator to also fence off an RFile index
   3. There is a test class that demonstrates the fencing called FencedRFileTest
   4.  RFileScanner was updated so clients can also pass a range for an RFile. The matching classes (RFileScannerBuilder, etc) were updated as well. Two tests were added to demonstrate fencing in RFileClientTest. One demonstrates using the client Scanner and the other uses FileOperations. The FileOperations test probably belong somewhere else but this was mostly just to demonstrate it works. Note that a RFileScanner for clients already takes a range but that range is an overall range across multiple files where as this shows passing a range per RFile. Ultimately we may decide we don't need to fence RFileScanner but it demonstrates we can if we want to.
   
   There is more work to do in this PR and/or follow on PRs:
   1. Make sure all places that need to fence can read an RFile also pass in a range for the fenced iterator
   2. Add some tests to verify the changes in RFileOperations and FileManager work when opening a ranged file
   3. I will create a separate PR to handle writing the new DataFileValue metadata which will include adding a range to the CQ and storing a separate DFV for each combination of file and range. 
   4. After we can persist the ranges we need to update everywhere that uses a reader to be able to pass in ranges (compaction, scanners, etc) and it would be good to have some ITs to show metadata table changes can contain ranges and be read and fence off files
   5. Actually update the merge code to use all the changes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3418: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#issuecomment-1573721761

   @keith-turner - I pushed a new update to address the public API issues you identified, I also created a new [branch](https://github.com/apache/accumulo/tree/no-chop-merge) to merge this PR into when ready so I can keep making progress with follow on PRs without merging into main yet (as I mentioned possibly doing in my last comment). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3418: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#discussion_r1213781360


##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFile.java:
##########
@@ -76,6 +77,15 @@ public interface InputArguments {
      * @return this
      */
     ScannerFSOptions from(String... files);
+
+    /**
+     * Specify RFiles to read from. When multiple are specified the {@link Scanner} constructed will
+     * present a merged view.
+     *
+     * @param files one or more RFiles to read.
+     * @return this
+     */
+    ScannerFSOptions from(FencedRfile... files);

Review Comment:
   Needs a since tag in javadoc



##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFileSource.java:
##########
@@ -29,10 +32,12 @@
 public class RFileSource {
   private final InputStream in;
   private final long len;
+  private final Range range;
 
-  public RFileSource(InputStream in, long len) {
-    this.in = in;
+  public RFileSource(InputStream in, long len, Range range) {

Review Comment:
   This is a breaking API change.  Need to add a new overloaded constructors that take a range.



##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFileScannerBuilder.java:
##########
@@ -150,4 +162,24 @@ public ScannerOptions withBounds(Range range) {
     this.opts.bounds = range;
     return this;
   }
+
+  // UnreferencedTabletFile is not allowed here
+  // as part of the public API so we need a new object
+  public static class FencedRfile {

Review Comment:
   If in public API it needs a since tag.
   
   The class RFileScannerBuilder is package private. I think it contains implementation code behind the public API, but the type RFileScannerBuilder is not in the public API. This FencedRfile class should probably move into RFile or RFile.InputArguments.
   
   Can not wait until we can use Java's new record type for classes like this.



##########
core/src/main/java/org/apache/accumulo/core/metadata/AbstractTabletFile.java:
##########
@@ -53,4 +56,21 @@ public Path getPath() {
     return path;
   }
 
+  /**
+   * @return The range of the TabletFile
+   *
+   * @since 3.1.0

Review Comment:
   This is not in public API so since tag is not needed.



##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFileSource.java:
##########
@@ -42,4 +47,8 @@ public InputStream getInputStream() {
   public long getLength() {
     return len;
   }
+
+  public Range getRange() {

Review Comment:
   needs since tag in javadoc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3418: Add an optional range to Tablet files and fenced reader

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#discussion_r1214827264


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,213 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {

Review Comment:
   I ran some of the unit test w/ coverage and it seems llke this method and getLastKey() were not covered in unit test.



##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFile.java:
##########
@@ -76,6 +78,38 @@ public interface InputArguments {
      * @return this
      */
     ScannerFSOptions from(String... files);
+
+    /**
+     * Specify FencedRfiles to read from. When multiple are specified the {@link Scanner}
+     * constructed will present a merged view.
+     *
+     * @param files one or more FencedRfiles to read.
+     * @return this
+     *
+     * @since 3.1.0
+     */
+    ScannerFSOptions from(FencedRfile... files);
+
+    /**
+     * @since 3.1.0
+     */
+    class FencedRfile {

Review Comment:
   Could change this to match the camel case of the other code.
   
   ```suggestion
       class FencedRFile {
   ```
   
   Since this class is an inner class of RFile, including RFile in the name may be redundant.  Could name it something like
   
   ```suggestion
       class FencedPath {
   ```
   



##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFileSource.java:
##########
@@ -29,10 +32,16 @@
 public class RFileSource {
   private final InputStream in;
   private final long len;
+  private final Range range;
 
   public RFileSource(InputStream in, long len) {
-    this.in = in;
+    this(in, len, new Range());
+  }
+
+  public RFileSource(InputStream in, long len, Range range) {

Review Comment:
   ```suggestion
     /**
      * @since 3.1.0
      */
     public RFileSource(InputStream in, long len, Range range) {
   ```



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,213 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fence.getStartKey();
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fence.getEndKey();
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public void next() throws IOException {
+      super.next();
+    }

Review Comment:
   Can this be removed?
   
   ```suggestion
   ```



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,213 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fence.getStartKey();
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fence.getEndKey();
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public void next() throws IOException {
+      super.next();
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
+      throw new UnsupportedOperationException();
+    }
+  }
+
+  static class FencedReader extends FencedFileSKVIterator implements RFileSKVIterator {
+
+    private final Reader reader;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      super(reader, seekFence);
+      this.reader = reader;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.reset();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public FileSKVIterator getIndex() throws IOException {
+      return new FencedIndex(reader.getIndex(), fence);
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      final Reader sample = reader.getSample(sampleConfig);

Review Comment:
   Didn't see test coverage on this either



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,213 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fence.getStartKey();
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fence.getEndKey();
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public void next() throws IOException {
+      super.next();
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
+      throw new UnsupportedOperationException();
+    }
+  }
+
+  static class FencedReader extends FencedFileSKVIterator implements RFileSKVIterator {
+
+    private final Reader reader;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      super(reader, seekFence);
+      this.reader = reader;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.reset();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);

Review Comment:
   Did not see this covered in unit test.  Would be nice to cover this and esnure the deep copy is independent (seeking it should not affect the original and visa versa) and the fence is properly passed.



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,213 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fence.getStartKey();
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fence.getEndKey();
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && fence.beforeStartKey(source.getTopKey())) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      return source.hasTop() && !fence.afterEndKey(source.getTopKey());
+    }
+
+    @Override
+    public void next() throws IOException {
+      super.next();
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public SortedKeyValueIterator<Key,Value> deepCopy(IteratorEnvironment env) {
+      throw new UnsupportedOperationException();
+    }
+  }
+
+  static class FencedReader extends FencedFileSKVIterator implements RFileSKVIterator {
+
+    private final Reader reader;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      super(reader, seekFence);
+      this.reader = reader;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.reset();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public FileSKVIterator getIndex() throws IOException {
+      return new FencedIndex(reader.getIndex(), fence);
+    }
+
+    @Override
+    public FileSKVIterator getSample(SamplerConfigurationImpl sampleConfig) {
+      final Reader sample = reader.getSample(sampleConfig);
+      return sample != null ? new FencedReader(sample, fence) : null;
+    }
+
+    @Override
+    public void reset() {
+      reader.reset();
+    }
+  }
+
+  public static RFileSKVIterator getReader(final CachableBuilder cb,
+      final AbstractTabletFile<?> dataFile) throws IOException {
+    final RFile.Reader reader = new RFile.Reader(Objects.requireNonNull(cb));
+    return dataFile.hasRange() ? new FencedReader(reader, dataFile.getRange()) : reader;
+  }
+
+  public static RFileSKVIterator getReader(final CachableBuilder cb, Range range)

Review Comment:
   If possible it would be nice if the unit test called this.  If its not workable don't worry about it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3418: Add an optional range to Tablet files and fenced reader

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#issuecomment-1574343833

   @keith-turner - Sounds good, I''ll merge this and create a new issue for adding tests. This way I can continue on and create a new PR for my work to store ranges along with file metadata when ready.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon merged pull request #3418: Add an optional range to Tablet files and fenced reader

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon merged PR #3418:
URL: https://github.com/apache/accumulo/pull/3418


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3418: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#issuecomment-1573682533

   > This looks good. I have not reviewed everything, still want to look over the testing changes. I should be able to do that tomorrow.
   
   Sounds good. So we could in theory start merging in these no-chop merge changes this into main as they are finished/read as long as it wouldn't break existing behavior (and this doesn't as the fencing won't be used yet). But I am leaning more towards the option of creating another branch in the repo that we can use for no-chop merge changes like this and follow on PRs. (just like we have one for elasticity).
   
   I figure once things look good, then we can merge everything back to main. This seems like the best approach for now as I'm hesitant to merge anything into main yet until more follow on work is complete, like updates to how we store metadata and actual merge algorithm changes, because things are likely to need to be tweaked a bit. Also I think all of this stuff should be targeted for after 3.0 (3.1 if that is the next version).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3418: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#discussion_r1206179041


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,218 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fence.getStartKey();
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fence.getEndKey();
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && source.getTopKey().compareTo(fence.getStartKey()) <= 0) {

Review Comment:
   The following is safer, properly handles any issues with inclusive and exclusive start key.
   
   ```suggestion
           while (source.hasTop() &&  fence.beforeStartKey(source.getTopKey())) {
   ```



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,5 +1558,218 @@ private void setInterruptFlagInternal(AtomicBoolean flag) {
     public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
+
+    @Override
+    public void reset() {
+      clear();
+    }
+  }
+
+  public interface RFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+
+    void reset();
+  }
+
+  static abstract class FencedFileSKVIterator implements FileSKVIterator {
+
+    private final FileSKVIterator reader;
+    protected final Range fence;
+
+    public FencedFileSKVIterator(FileSKVIterator reader, Range fence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = fence;
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      var rfk = reader.getFirstKey();
+      if (fence.beforeStartKey(rfk)) {
+        return fence.getStartKey();
+      } else {
+        return rfk;
+      }
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      var rlk = reader.getLastKey();
+      if (fence.afterEndKey(rlk)) {
+        return fence.getEndKey();
+      } else {
+        return rlk;
+      }
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public DataInputStream getMetaStore(String name) throws IOException {
+      return reader.getMetaStore(name);
+    }
+
+    @Override
+    public void closeDeepCopies() throws IOException {
+      reader.closeDeepCopies();
+    }
+
+    @Override
+    public void setCacheProvider(CacheProvider cacheProvider) {
+      reader.setCacheProvider(cacheProvider);
+    }
+
+    @Override
+    public void close() throws IOException {
+      reader.close();
+    }
+  }
+
+  static class FencedIndex extends FencedFileSKVIterator {
+    private final FileSKVIterator source;
+
+    public FencedIndex(FileSKVIterator source, Range seekFence) {
+      super(source, seekFence);
+      this.source = source;
+    }
+
+    @Override
+    public boolean hasTop() {
+      // this code filters out data because the rfile index iterators do not support seek
+
+      // If startKey is set then discard everything until we reach the start
+      // of the range
+      if (fence.getStartKey() != null) {
+
+        while (source.hasTop() && source.getTopKey().compareTo(fence.getStartKey()) <= 0) {
+          try {
+            source.next();
+          } catch (IOException e) {
+            throw new UncheckedIOException(e);
+          }
+        }
+      }
+
+      // If endKey is set then ensure that the current key is not passed the end of the range
+      if (fence.getEndKey() != null) {
+        return source.hasTop() && source.getTopKey().compareTo(fence.getEndKey()) <= 0;
+      }
+
+      // If neither start/end is set then just delegate to the normal hasTop() method
+      return source.hasTop();

Review Comment:
   ```suggestion
         return source.hasTop() && !fence.afterEndKey(source.getTopKey());
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3418: Add an optional range to Tablet files and fenced reader

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#discussion_r1214859471


##########
core/src/main/java/org/apache/accumulo/core/client/rfile/RFile.java:
##########
@@ -76,6 +78,38 @@ public interface InputArguments {
      * @return this
      */
     ScannerFSOptions from(String... files);
+
+    /**
+     * Specify FencedRfiles to read from. When multiple are specified the {@link Scanner}
+     * constructed will present a merged view.
+     *
+     * @param files one or more FencedRfiles to read.
+     * @return this
+     *
+     * @since 3.1.0
+     */
+    ScannerFSOptions from(FencedRfile... files);
+
+    /**
+     * @since 3.1.0
+     */
+    class FencedRfile {

Review Comment:
   Oops that was a typo, I'll fix that.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3418: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3418:
URL: https://github.com/apache/accumulo/pull/3418#issuecomment-1566244211

   I started work today in another branch on updates to store metadata with a range. It's not done and not everything is finished yet (like garbage collection code needs fixing) so a lot of tests don't run but here is the current commit where i left off: https://github.com/apache/accumulo/commit/d936ee7fb9c8e38d5441efd512818492ec558503 That branch is based off of this PR and also has the changes merged in from #3417 and #3432 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org