You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "cshannon (via GitHub)" <gi...@apache.org> on 2023/04/22 18:04:52 UTC

[GitHub] [accumulo] cshannon opened a new pull request, #3332: WIP - Add an optional fenced range to Tablet files

cshannon opened a new pull request, #3332:
URL: https://github.com/apache/accumulo/pull/3332

   This allows treating each fenced range of an RFile as a separate file for reading purposes. This PR is an alternate version to #3246 and is part of #1327 
   
   The idea here is to associate a range/fence with a Tablet file so that we can easily treat the combination of an RFile and fence as a unique file which means less changes to the rest of the code base when we have multiple ranges for a single file as the code just thinks they are unique files. For more information see the comment [here](https://github.com/apache/accumulo/issues/1327#issuecomment-1509174746) and [here](https://github.com/apache/accumulo/issues/1327#issuecomment-1509338370).
   
   So, for example, if we had 5 ranges defined for an RFile we'd load up 5 "files" that were fenced off by each range and the rest of the code would just get a list of 5 readers and wouldn't know that they were actually the same file and wouldn't care when iterating. The 5 fenced files (that are really just subsets of the same file) are treated identical by everywhere else in the code as 5 unique files.
   
   There's still more work to do here but this is my current progress. Things still to do:
   
   1. Add a fence for the RFile index as well
   2. Add some tests to verify the changes in RFileOperations and FileManager work when opening a ranged file
   3. I will create a separate PR to handle writing the new DataFileValue metadata which will include adding a range to the CQ and storing a separate DFV for each combination of file and range. 
   4. After we can persist the ranges we need to update everywhere that uses a reader to be able to pass in ranges (compaction, scanners, etc)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180534839


##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -411,13 +419,26 @@ public class ReaderBuilder extends FileHelper implements ReaderTableConfiguratio
     private CacheProvider cacheProvider;
     private Cache<String,Long> fileLenCache;
     private boolean seekToBeginning = false;
+    private Range fence;
 
     public ReaderTableConfiguration forFile(String filename, FileSystem fs, Configuration fsConf,
         CryptoService cs) {
       filename(filename).fs(fs).fsConf(fsConf).cryptoService(cs);
       return this;
     }
 
+    public ReaderTableConfiguration forFile(TabletFile file, FileSystem fs, Configuration fsConf,
+        CryptoService cs) {
+      return forFile(file.getPathStr(), file.getFence(), fs, fsConf, cs);
+    }
+
+    public ReaderTableConfiguration forFile(String filename, Range fence, FileSystem fs,

Review Comment:
   Yeah I can rework it a bit and add that, it would probably make things a little better with sticking to the builder pattern.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1533434072

   > Is the concern that with the above situation adding a range per file will add more data?
   
   I thought this through a little more, and with respect to bulk loads  I don't think there is an issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1530590999

   My main concern is bulk imports here. It's possible to have one bulk import file that maps to many tablets. Depending on how many bulk imports you do in a day, you could end up having a lot of short-lived file entries in the tablet metadata, with them getting removed as the bulk import files are compacted away. The tablet metadata will go through grow/shrink cycles depending on the frequency of the bulk imports into that tablet. If that is happening on a large number of tablets, then it's possible/likely to reach the split threshold for the metadata table tablets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon closed pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon closed pull request #3332: WIP - Add an optional fenced range to Tablet files
URL: https://github.com/apache/accumulo/pull/3332


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1175467314


##########
core/src/main/java/org/apache/accumulo/core/client/admin/compaction/CompactableFile.java:
##########
@@ -31,6 +32,8 @@ public interface CompactableFile {
 
   public String getFileName();
 
+  public Range getFence();

Review Comment:
   This class is in public API, so the new method needs a since tag in javadoc



##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   Instead of adding this getFence method would is be possible to pass TabletFile to this class instead of a String?  Then the getFence() and getFilename() methods could be replaced with a getTabletFile() method that has the filename and range.



##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -591,4 +593,35 @@ public synchronized int getNumOpenFiles() {
   public ScanFileManager newScanFileManager(KeyExtent tablet, CacheProvider cacheProvider) {
     return new ScanFileManager(tablet, cacheProvider);
   }
+
+  private static class FencedFile {

Review Comment:
   Could TabletFile be used intead of this class?



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   Even when the fence range is not infinite, the first and last keys of the underlying file may still fall within the range.   If so then those should be returned.
   
   ```suggestion
       @Override
       public Key getFirstKey() throws IOException {
         var rfk = reader.getFirstKey();
         if(fence.beforeStartKey(rfk)) {
           // TODO could seek the reader to find the least key in the range.  Not sure this is worth the effort, would result in ensuring when a tablet splits that files are only assigned to child tablets when they have data for that tablet.  Also its possible that rfile has no key within the fence range.
           return fence.getStartKey();
         } else {
           return rfk;
         }
       }
   
       @Override
       public Key getLastKey() throws IOException {
         var rlk = reader.getLastKey();
         if(fence.afterEndKey(rlk)) {
           //TODO could attempt to find the greatest key in the reader that is in the range.   may need to add internal methods to rfile to do this efficiently
           return fence.getEndKey();
         } else {
           return rlk;
         }
       }
   ```



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {

Review Comment:
   If not already done, a test like the following would be useful.
   
    1. seek with a range that overlaps the fence
    2. Read some of the data from the iterator, but not all (leaving the iterator such that hasTop() is currently returning true)
    3. seek with a range that disjoint from the fence
    4. assert that hasTop returns false
   
   This test ensures that seek properly resets the state of the iterator for the disjoint case.  I think the code currently handles this case correctly, but not sure if its tested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1175581316


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   Feel free to ignore the TODOs in the suggestion.  I was thinking through a situation that I was not sure of the best way to handle, but I don't think it has to be handled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1181581407


##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -244,8 +244,9 @@ private List<String> takeOpenFiles(Collection<String> files,
     return filesToOpen;
   }
 
-  private Map<FileSKVIterator,String> reserveReaders(KeyExtent tablet, Collection<String> files,
-      boolean continueOnFailure, CacheProvider cacheProvider) throws IOException {
+  private Map<FileSKVIterator,TabletFile> reserveReaders(KeyExtent tablet,
+      Collection<TabletFile> files, boolean continueOnFailure, CacheProvider cacheProvider)
+      throws IOException {
 
     if (!tablet.isMeta() && files.size() >= maxOpen) {

Review Comment:
   `files` may contain multiple instances that reference the same underlying file, just with a different range, right? This would seem to me that there is a higher probability that users will hit the limit set by `TSERV_SCAN_MAX_OPENFILES` and other file limit properties. I think we need to take this into account when checking the number of open files against their respective properties. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180780194


##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   So looking at this I'm not sure the best way to handle the fence/range here in `FileOperations`. FileOperations is actually abstract and [MapFileOperations](https://github.com/apache/accumulo/blob/18e7a391d0eee40ee7258cd6bdc859a8cfde649a/core/src/main/java/org/apache/accumulo/core/file/map/MapFileOperations.java) currently extends it (although might be removed by ##3359). So arguably the TabletFile or Fence reference really needs to go into the subclass which is `RFileOperations`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1530571534

   > So in general I'd expect the amount of extra generated metadata to be correlated to the number of merges that are done
   
   That is what I would expect also and I don't think it will differ much from what we currently have. Currently if 50 tablets are merged into 1 and each tablet had 2 files then the new tablet could end up with 100 files.  That should be the same with these changes except now the files will have a range. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180871129


##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   I was thinking that FileOperations could look at the TabletFile and open and RFile or MapFile (not passing TabletFile down any further) and optionally (if the range is present or not empty.) wrap it with a fencing iterator.  So file operations is where the file+optional range is handled, not in rfile itself.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180521896


##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -591,4 +593,35 @@ public synchronized int getNumOpenFiles() {
   public ScanFileManager newScanFileManager(KeyExtent tablet, CacheProvider cacheProvider) {
     return new ScanFileManager(tablet, cacheProvider);
   }
+
+  private static class FencedFile {

Review Comment:
   Good point, I can just use TabletFile. I was just trying to keep the minimal amount we needed to store in the maps but TabletFile is a pretty small object already and it would make sense to use here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180538903


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   > Even when the fence range is not infinite, the first and last keys of the underlying file may still fall within the fence range. If so then those should be returned.
   
   I guess I was thinking a fence shouldn't or wouldn't be created that extends outside the file but this would be a good thing to check either way. I'm not sure if we need or want to do some validation when creating a Fence to make sure it's valid but we probably don't really need to care as long as the underlying code is smart enough to realize the fence boundary is larger than the file and handle it as you pointed out here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180871129


##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   I was thinking that FileOperations could look at the TabletFile and open and RFile or MapFile (not passing TabletFile down any further) and optionally (if the range is present or not empty.) wrap it with a fencing iterator.  So file operations is where the file+optional range is handled.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1181878173


##########
server/base/src/main/java/org/apache/accumulo/server/fs/FileManager.java:
##########
@@ -244,8 +244,9 @@ private List<String> takeOpenFiles(Collection<String> files,
     return filesToOpen;
   }
 
-  private Map<FileSKVIterator,String> reserveReaders(KeyExtent tablet, Collection<String> files,
-      boolean continueOnFailure, CacheProvider cacheProvider) throws IOException {
+  private Map<FileSKVIterator,TabletFile> reserveReaders(KeyExtent tablet,
+      Collection<TabletFile> files, boolean continueOnFailure, CacheProvider cacheProvider)
+      throws IOException {
 
     if (!tablet.isMeta() && files.size() >= maxOpen) {

Review Comment:
   I would think that in this case we could just track open files by the actual file and not use the ranges. It would make sense to me that you'd only care about real files that were open in HDFS so if you had several files that were the same but different ranges we would just count it as 1 and then this wouldn't be an issue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1530684930

   >  Depending on how many bulk imports you do in a day, you could end up having a lot of short-lived file entries in the tablet metadata, with them getting removed as the bulk import files are compacted away. 
   
   Is the concern that with the above situation adding a range per file will add more data?  We could make the serialization of an infinite range that these files would have really small, that common case could be handled specially in the serialization.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1537543269

   I started making some more changes to this in a new branch for now https://github.com/cshannon/accumulo/tree/accumulo-3332-file-meta-entry. The updates in the new branch are based on changes from #3385 . If/when #3385 gets merged I can then update this PR to be based on that or open a new PR, etc. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1175581316


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   Feel free to ignore the TODOs in the suggestion.  I was thinking through a situation that I was not sure of the best way to handle.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1529730624

   > I have been working today on a new commit/PR to start adding range information to the CQ when storing tablet file metadata so we can store the same file with different ranges and treat them as unique files.
   
   What's the impetus for this vs what is being done here? I'm concerned about potential explosion in temporary tablet metadata entries (bulk import, flush files) that may cause the metadata tablets to split unnecessarily.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1530237778

   > > I have been working today on a new commit/PR to start adding range information to the CQ when storing tablet file metadata so we can store the same file with different ranges and treat them as unique files.
   > 
   > What's the impetus for this vs what is being done here? I'm concerned about potential explosion in temporary tablet metadata entries (bulk import, flush files) that may cause the metadata tablets to split unnecessarily.
   
   I can let @keith-turner comment as well but I'm not sure how much of an issue this would be in practice but we can certainly test. My thoughts initially are that we really only would more than 1 range per file when merging but maybe there would be other use cases for it. I'm not sure how much that would impact bulk import as I wouldn't expect those files to have more than one range, at least on first import.  I haven't looked at bulk import much though so maybe I am missing something. So in general I'd expect the amount of extra generated metadata to be correlated to the number of merges that are done on tables and that would depend on the system and whether or not we decide to enable some sort of automatic merging in the future like we do with automatic splits.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1518719516

   @keith-turner - As I noted there is still plenty more work to do here but I wanted to submit what I had so far to get your thoughts before I go too much further. I ended up doing most of the changes to load up the fenced iterators in `FileManager` as that was where most of the tracking was done for caching of files so I had to change everything there to track unique files by Range as well. I didn't actually use the `wrapFileIterator()` method inside of `DataFileValue` as it made more sense to me to handle it in `RFileOperations` when building a reader based on how the current design was.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] ctubbsii commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "ctubbsii (via GitHub)" <gi...@apache.org>.
ctubbsii commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1530357675

   > @ctubbsii - Any thoughts on the best way to chain PRs or how to handle this situation? I know this was a problem with elasticity as well.
   
   Replied with a few possibilities in Slack instead of here, to avoid cluttering the PR with contributor workflow discussion.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180517931


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);

Review Comment:
   Right, we only need to fence by rows and don't care about the entire Key object. I already commented in the now closed #3328 that I plan to use the RowRange object when it is ready but it needs to support all the existing methods like clip for it to be useful though. For now I can just keep using Range and keep those other fields null and should be easy to swap it out later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1529120364

   I have been working today on a new commit/PR to start adding range information to the CQ when storing tablet file metadata so we can store the same file with different ranges and treat them as unique files. 
   
   The PR I am working on is based off of this branch so I'm not sure the best way of going about submitting a draft PR. It would be nice to submit a separate PR for the change as it's pretty substantial but if I do that it also pulls in the commits from this PR so it's not isolated so I might as well just push the new commit here if I were to do that.
   
   @ctubbsii - Any thoughts on the best way to chain PRs or how to handle this situation? I know this was a problem with elasticity as well.
   
   I'm not quite ready to submit the draft PR yet as there's still some things that need fixing that are expecting the old format (like GC so those tests break) but the branch i'm working on is here https://github.com/cshannon/accumulo/tree/file-metadata-range  I'll create an actual draft PR when I think it's ready to get some initial reviews, hopefully later this week. In the meantime i'll be amending/force pushing a lot.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1180548861


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   I think fence range extending outside the file will be common.  The fence range comes from a tablet range and its normal for tablets to contain files that are completely inside the tablet range. So when a tablet is merged into another it brings its files along fenced via its tablet range.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1544854971

   I'm going to close this for now because I'm going to create another PR first that just makes changes with using StoredTabletFile and TabletFile in more places instead of a String first so that when we do add a Range we can get to it. See https://github.com/apache/accumulo/pull/3385#issuecomment-1544820857 for more information. Also, the old Map data file is now removed as well which causes conflicts in this branch.
   
   After I finish the changes with using Tabletfile for tracking my plan is to reopen a new PR(s) that will then add a range to Tablet files for reading (and a new iterator, etc) and also will update how we store the files in the metadata table to include a range and treat each file and range combination uniquely.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] keith-turner commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1175558585


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public boolean hasTop() {
+      return reader.hasTop();
+    }
+
+    @Override
+    public void next() throws IOException {
+      reader.next();
+    }
+
+    @Override
+    public Key getTopKey() {
+      return reader.getTopKey();
+    }
+
+    @Override
+    public Value getTopValue() {
+      return reader.getTopValue();
+    }
+
+    @Override
+    public FencedReader deepCopy(IteratorEnvironment env) {
+      return new FencedReader(reader.deepCopy(env), fence);
+    }
+
+    @Override
+    public boolean isRunningLowOnMemory() {
+      return reader.isRunningLowOnMemory();
+    }
+
+    @Override
+    public void setInterruptFlag(AtomicBoolean flag) {
+      reader.setInterruptFlag(flag);
+    }
+
+    @Override
+    public Key getFirstKey() throws IOException {
+      return fence.isInfiniteStartKey() ? reader.getFirstKey() : fence.getStartKey();
+    }
+
+    @Override
+    public Key getLastKey() throws IOException {
+      return fence.isInfiniteStopKey() ? reader.getLastKey() : fence.getEndKey();
+    }

Review Comment:
   Even when the fence range is not infinite, the first and last keys of the underlying file may still fall within the fence range.   If so then those should be returned.
   
   ```suggestion
       @Override
       public Key getFirstKey() throws IOException {
         var rfk = reader.getFirstKey();
         if(fence.beforeStartKey(rfk)) {
           // TODO could seek the reader to find the least key in the range.  Not sure this is worth the effort, would result in ensuring when a tablet splits that files are only assigned to child tablets when they have data for that tablet.  Also its possible that rfile has no key within the fence range.
           return fence.getStartKey();
         } else {
           return rfk;
         }
       }
   
       @Override
       public Key getLastKey() throws IOException {
         var rlk = reader.getLastKey();
         if(fence.afterEndKey(rlk)) {
           //TODO could attempt to find the greatest key in the reader that is in the range.   may need to add internal methods to rfile to do this efficiently
           return fence.getEndKey();
         } else {
           return rlk;
         }
       }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] dlmarion commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "dlmarion (via GitHub)" <gi...@apache.org>.
dlmarion commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1175257497


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {

Review Comment:
   I wonder if the word `Index` might create confusion here considering the RFile has an index. The name of this interface is meant to convey that this is bound to some range? What about `BoundedFileSKVIterator`?



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {
+        range = fence.clip(range, true);
+        if (range == null) {
+          return;
+        }
+      }
+
+      reader.seek(range, columnFamilies, inclusive);

Review Comment:
   It appears that this just uses the Rows of the start/stop Keys and not the entire Key object. Could you use the RowRange object that Dom is working on here?



##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -411,13 +419,26 @@ public class ReaderBuilder extends FileHelper implements ReaderTableConfiguratio
     private CacheProvider cacheProvider;
     private Cache<String,Long> fileLenCache;
     private boolean seekToBeginning = false;
+    private Range fence;
 
     public ReaderTableConfiguration forFile(String filename, FileSystem fs, Configuration fsConf,
         CryptoService cs) {
       filename(filename).fs(fs).fsConf(fsConf).cryptoService(cs);
       return this;
     }
 
+    public ReaderTableConfiguration forFile(TabletFile file, FileSystem fs, Configuration fsConf,
+        CryptoService cs) {
+      return forFile(file.getPathStr(), file.getFence(), fs, fsConf, cs);
+    }
+
+    public ReaderTableConfiguration forFile(String filename, Range fence, FileSystem fs,

Review Comment:
   Would creating a `withRange` method reduce the number of overall changes? Calling code could be:
   ```
   forFile(file, fs, conf, cs).withRange(range)
   ```



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFileOperations.java:
##########
@@ -53,13 +56,16 @@ public class RFileOperations extends FileOperations {
 
   private static final Collection<ByteSequence> EMPTY_CF_SET = Collections.emptySet();
 
-  private static RFile.Reader getReader(FileOptions options) throws IOException {
+  private static IndexedFileSKVIterator getReader(FileOptions options) throws IOException {
     CachableBuilder cb = new CachableBuilder()
         .fsPath(options.getFileSystem(), new Path(options.getFilename()), options.dropCacheBehind)
         .conf(options.getConfiguration()).fileLen(options.getFileLenCache())
         .cacheProvider(options.cacheProvider).readLimiter(options.getRateLimiter())
         .cryptoService(options.getCryptoService());
-    return new RFile.Reader(cb);
+    final RFile.Reader reader = new RFile.Reader(cb);
+    return Optional.ofNullable(options.getFence())

Review Comment:
   ```suggestion
       return options.getFence() == null ? reader : new FencedReader(reader, options.getFence());
   ```
   
   Functionally equivalent, but doesn't create objects.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1181086657


##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   So the next problem is all the validation done by TabletFile. I was trying to just replace filename with TabletFile everywhere and require those objects but that breaks quite a bit of things. The reason is TabletFile does a lot of validation in the constructor to ensure the correct directory structure and lots of places (mostly tests) are not passing in rfiles that will pass validation because they are temp files.  For example https://github.com/apache/accumulo/blob/ba472d6e24daa8f0014a22cabace3061f5d46413/core/src/test/java/org/apache/accumulo/core/client/rfile/RFileClientTest.java#L230
   
   I also don't know if any of the Map file stuff would pass validation either (I doubt it) but there are no tests for that to even try it unless I write some.
   
   So not sure the past approach. I think the end goal would be to only support TabletFile to prevent errors so we'd need to create another implementation of TabletFile that doesn't do validation checks (maybe just for testing). But for the short term for now maybe we just could have FileOperations support passing either a TabletFile or a filename and then we can track both and make sure only 1 type is set. 
   
   But I'm good with doing whatever, it really just depends how much code we want to preemptively change right now for places that are actually using FileOperations vs waiting until we are storing the new ranges in metadata and can actually use the data and make the change then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1181086657


##########
core/src/main/java/org/apache/accumulo/core/file/FileOperations.java:
##########
@@ -267,6 +271,10 @@ public Set<ByteSequence> getColumnFamilies() {
     public boolean isRangeInclusive() {
       return inclusive;
     }
+
+    public Range getFence() {

Review Comment:
   So the next problem is all the validation done by TabletFile. I was trying to just replace filename with TabletFile everywhere and require those objects but that breaks quite a bit of things. The reason is TabletFile does a lot of validation in the constructor to ensure the correct directory structure and lots of places (mostly tests) are not passing in rfiles that pass validation because they are temp files.  For example https://github.com/apache/accumulo/blob/ba472d6e24daa8f0014a22cabace3061f5d46413/core/src/test/java/org/apache/accumulo/core/client/rfile/RFileClientTest.java#L230
   
   I also don't know if any of the Map file stuff would pass validation either (I doubt it) but there are no tests for that to even try it unless I write some.
   
   So not sure the past approach. I think the end goal would be to only support TabletFile to prevent errors so we'd need to create another implementation of TabletFile that doesn't do validation checks (maybe just for testing). But for the short term for now maybe we just could have FileOperations support passing either a TabletFile or a filename and then we can track both and make sure only 1 type is set. 
   
   But I'm good with doing whatever, it really just depends how much code we want to preemptively change right now for places that are actually using FileOperations vs waiting until we are storing the new ranges in metadata and can actually use the data and make the change then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on a diff in pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on code in PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#discussion_r1181094376


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1556,117 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  interface IndexedFileSKVIterator extends FileSKVIterator {
+    FileSKVIterator getIndex() throws IOException;
+  }
+
+  static class FencedReader implements IndexedFileSKVIterator {
+
+    private final Reader reader;
+    private final Range fence;
+
+    public FencedReader(Reader reader, Range seekFence) {
+      this.reader = Objects.requireNonNull(reader);
+      this.fence = seekFence;
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      reader.clear();
+
+      if (fence != null) {

Review Comment:
   I added a new test case for this in my latest commit: https://github.com/apache/accumulo/blob/d755bc3c7bc9639660e3f1edb70a616fc12906f1/core/src/test/java/org/apache/accumulo/core/file/rfile/FencedRFileTest.java#L186



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [accumulo] cshannon commented on pull request #3332: WIP - Add an optional fenced range to Tablet files

Posted by "cshannon (via GitHub)" <gi...@apache.org>.
cshannon commented on PR #3332:
URL: https://github.com/apache/accumulo/pull/3332#issuecomment-1528807830

   > I wonder if instead of creating `FencedReader` and `IndexedFileSKVIterator` the existing objects should be modified to include the fence Range. I think the fence Range should be required, not optional. I think that would make the code less confusing in the future - one way of doing things vs the developer having to make the "right" choice later.
   
   I've been going back and forth with this idea. We could ultimately decide to do this but I talked to @keith-turner about this a bit yesterday and and he thought having it separate is nice for testing and for optimization as we can leave out all the extra code for handing fencing in the iterator stack if there's no fence effective fencing going on. Even if we make it required it still could be effectively optional and ignored if it is an infinite range or covers the entire file. On the other hand we could probably write it in such a way that even if we require a fence object in the RFile we could essentially avoid the performance penalty by skipping all the checks and seeks and stuff if we can detect that the rfile fits entirely in the fence. I can let @keith-turner comment here too if he wants to elaborate more on this thoughts about this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org