You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/04/24 16:59:54 UTC

[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1175467314


##########
core/src/main/java/org/apache/accumulo/core/client/admin/compaction/CompactableFile.java:
##########
@@ -31,6 +32,8 @@ public interface CompactableFile {
 
   public String getFileName();
 
+  public Range getFence();

Review Comment:
   This class is in public API, so the new method needs a since tag in javadoc



##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   Instead of adding this getFence method would is be possible to pass TabletFile to this class instead of a String?  Then the getFence() and getFilename() methods could be replaced with a getTabletFile() method that has the filename and range.



##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -591,4 +593,35 @@ public synchronized int getNumOpenFiles() {
   public ScanFileManager newScanFileManager(KeyExtent tablet, CacheProvider cacheProvider) {
     return new ScanFileManager(tablet, cacheProvider);
   }
+
+  private static class FencedFile {

Review Comment:
   Could TabletFile be used intead of this class?



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   Even when the fence range is not infinite, the first and last keys of the underlying file may still fall within the range.   If so then those should be returned.
   
   ```suggestion
       @Override
       public Key getFirstKey() throws IOException {
         var rfk = reader.getFirstKey();
         if(fence.beforeStartKey(rfk)) {
           // TODO could seek the reader to find the least key in the range.  Not sure this is worth the effort, would result in ensuring when a tablet splits that files are only assigned to child tablets when they have data for that tablet.  Also its possible that rfile has no key within the fence range.
           return fence.getStartKey();
         } else {
           return rfk;
         }
       }
   
       @Override
       public Key getLastKey() throws IOException {
         var rlk = reader.getLastKey();
         if(fence.afterEndKey(rlk)) {
           //TODO could attempt to find the greatest key in the reader that is in the range.   may need to add internal methods to rfile to do this efficiently
           return fence.getEndKey();
         } else {
           return rlk;
         }
       }
   ```



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {

Review Comment:
   If not already done, a test like the following would be useful.
   
    1. seek with a range that overlaps the fence
    2. Read some of the data from the iterator, but not all (leaving the iterator such that hasTop() is currently returning true)
    3. seek with a range that disjoint from the fence
    4. assert that hasTop returns false
   
   This test ensures that seek properly resets the state of the iterator for the disjoint case.  I think the code currently handles this case correctly, but not sure if its tested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org