You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2020/09/17 17:58:55 UTC

[GitHub] [accumulo] Manno15 commented on issue #1664: Investigate optimal / configurable batch sizes for accumulo-gc deletion candidates

Manno15 commented on issue #1664:
URL: https://github.com/apache/accumulo/issues/1664#issuecomment-694402054


   From my testing, I can see a clear correlation between the number of times it reaches the batch size limit to how long it takes for the collection cycle to complete. With the trial of never reaching the limit being the quickest. 
   
   The issues I faced were in getting the amount of delete candidates and more importantly the candidate length long enough without my cluster of computers crashing or running out of memory. I was never able to have a trial where the batch size limit greater than 12MB was reached. 
   
   From my actual trials, with 75k delete candidates, a batch size of 12MB completed on average of 34 seconds. Which was the quickest. At the same amount of candidates, a batch size of 2 MB completed in 58 seconds. That's a decent amount of time difference for such a small difference in batch size. I would like to revisit this eventually and give it a full range of tests with batch sizes ranging from at least 12 MB to128 MB. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org