You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "keith-turner (via GitHub)" <gi...@apache.org> on 2023/03/21 19:49:22 UTC

[GitHub] [accumulo] keith-turner commented on a diff in pull request #3246: WIP - Prototype Ranged RFile Reader

keith-turner commented on code in PR #3246:
URL: https://github.com/apache/accumulo/pull/3246#discussion_r1143897527


##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1565,256 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  public static class RangedReader extends HeapIterator implements RFileReader {
+
+    private final List<Range> ranges;
+    private final List<FencedReader> readers;
+    private final Reader reader;
+
+    public RangedReader(CachableBlockFile.CachableBuilder b) throws IOException {
+      this(new Reader(b), List.of(new Range()), null);
+    }
+
+    public RangedReader(CachableBlockFile.CachableBuilder b, Collection<Range> ranges)
+        throws IOException {
+      this(new Reader(b), ranges, null);
+    }
+
+    protected RangedReader(Reader reader, Collection<Range> ranges,
+        SamplerConfiguration samplerConfig) {
+      super(Objects.requireNonNull(ranges).size());
+      Preconditions.checkArgument(ranges.size() > 0,
+          "Range collection must contain at least 1 range");
+      this.reader = reader;
+      this.ranges = List.copyOf(Range.mergeOverlapping(ranges));
+      this.readers = createFencedReaders(reader, this.ranges, samplerConfig);
+    }
+
+    @Override
+    public void init(SortedKeyValueIterator<Key,Value> source, Map<String,String> options,
+        IteratorEnvironment env) throws IOException {
+      throw new UnsupportedOperationException();
+    }
+
+    @Override
+    public RangedReader deepCopy(IteratorEnvironment env) {
+      return new RangedReader(this, env);
+    }
+
+    private RangedReader(RangedReader other, IteratorEnvironment env) {
+      super(other.readers.size());
+      this.reader = other.reader.deepCopy(env);
+      this.ranges = List.copyOf(other.ranges);
+      this.readers = createFencedReaders(reader, ranges, env);
+    }
+
+    @Override
+    public void seek(Range range, Collection<ByteSequence> columnFamilies, boolean inclusive)
+        throws IOException {
+      clear();
+
+      for (SortedKeyValueIterator<Key,Value> skvi : readers) {

Review Comment:
   I suspect if the ranges are non overlapping we should be able to quickly find only the ranges that overlap.  
   
   ```java
   for (SortedKeyValueIterator<Key,Value> skvi : findOverlapping(range, readers)) {
           skvi.seek(range, columnFamilies, inclusive);
           addSource(skvi);
   }
   ```
   
   This could be useful for the case where there are lots of ranges, would avoid iterating through all of the ranges checking if each one overlaps.  Probably not worthwhile when there are only a few ranges though.  May be able to quickly implement this  using `TreeMap<Key,Range>` where the last key of each range and the range is placed in the map and the TreeMap.tailMap function is used to find a place to start looking for overlapping ranges.
   
   This is not something that needs to be done in this PR, if its ever done it could be in a follow on PR.



##########
core/src/main/java/org/apache/accumulo/core/file/rfile/RFile.java:
##########
@@ -1553,4 +1565,256 @@ public void setCacheProvider(CacheProvider cacheProvider) {
       reader.setCacheProvider(cacheProvider);
     }
   }
+
+  public static class RangedReader extends HeapIterator implements RFileReader {

Review Comment:
   Is there anything forcing this to be inside RFile?  If it could be exist outside of RFile then it seems like it would be better for testing because RFile and this class could be tested independently. If it were separate then when Accumulo had ranges for an RFile then it could use this iterator on top of the RFile iterator.
   
   Also want to understand if there is something forcing this design decision as that will help me review this further.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org