You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/12/01 19:27:22 UTC

[GitHub] [accumulo] keith-turner commented on pull request #2368: Create GenerateSplits utility

keith-turner commented on pull request #2368:
URL: https://github.com/apache/accumulo/pull/2368#issuecomment-983983979


   > Based on your comments, it sounded like you were avoiding selecting the first split, why would that be? 
   
   That comment was based on a bug in the first cut of the selection alogrithm which always used the first source row.  For larger amounts of data this would not be desirable.  If there are 10,000 source rows from which you want to derive 100 splits, would not want to use the first source row as the first split.   Would want the ~100th source row for the first split.  For the case you are testing w/ 6 source rows and 4 desired split, it probably does not matter if the first source row is chosen.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org