You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/11/17 22:22:32 UTC

[GitHub] [accumulo] keith-turner edited a comment on issue #2361: Utility to generate splits

keith-turner edited a comment on issue #2361:
URL: https://github.com/apache/accumulo/issues/2361#issuecomment-972149849


   > And getting that index through the Rfile reader here
   
   Yeah tha tis the code I was thinking about.  Looked around and found the following code that the tserver uses to find a single split point by inspecting indexes.
   
   https://github.com/apache/accumulo/blob/f8bb900ae080fe0f54dfe04f9e1ad8c4dd2e7930/server/base/src/main/java/org/apache/accumulo/server/util/FileUtil.java#L289
   
   The code makes two passes.  First it counts the number of index entries.  Second read through them again using a merged view of the indexes and takes the count/2 entry.  Could possibly do something similar for N entries.  Do one pass over the index data to count the entries and then another path to take every count/N entry.  The code above falls back to scanning the data in the rfiles instead of the index, would probably need to do that sometime for this use case also.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org