You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/11/26 22:38:56 UTC

[GitHub] [accumulo-testing] phrocker commented on issue #119: Add asynchronous input format and bulk key.

phrocker commented on issue #119: Add asynchronous input format and bulk key.
URL: https://github.com/apache/accumulo-testing/pull/119#issuecomment-558848061
 
 
   I didn't create an issue, so allow me to provide justification for this PR. In reviewing running jobs I realized two things:
   
   A) spills and sorts were spending an inordinate amount of time in the sort of Keys. To address this BulkKey will have a comparator that will reduce the deserialization time and memory.
   B) Generating keys takes quite some time. Parallelization will allow us to create them asynchronously, while allowing the current code path to remain the same.
   
   I ran these on a test system with very positive results. Happy to provide any benchmark results if desired. I think with Keith's ideas I'll likely be able to make some additional improvements. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services