You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by "FineAndDandy (via GitHub)" <gi...@apache.org> on 2024/01/02 16:26:52 UTC

[I] RFile writes should utilize multiple threads [accumulo]

FineAndDandy opened a new issue, #4124:
URL: https://github.com/apache/accumulo/issues/4124

   **Is your feature request related to a problem? Please describe.**
   The write operations to an rfile are serialized. When writing large rfiles in map reduce jobs this can produces very large tales to the jobs. The bottleneck is often compression rather than i/o. 
   
   **Describe the solution you'd like**
   Utilizing multiple threads to process multiple blocks in parallel could dramatically improve write performance. Having a dedicated thread to write completed blocks in order would still be necessary, but should be possible. This could be scaled based on available memory for buffering.
   
   **Describe alternatives you've considered**
   Adding pipelines to the existing code could be a smaller lift, and have a big performance improvement as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] RFile writes should utilize multiple threads [accumulo]

Posted by "keith-turner (via GitHub)" <gi...@apache.org>.
keith-turner commented on issue #4124:
URL: https://github.com/apache/accumulo/issues/4124#issuecomment-1907045405

   Prototyped having another thread write in this [branch](https://github.com/keith-turner/accumulo/tree/accumulo-4124).  Wrote this little [test program](https://github.com/keith-turner/accumulo/blob/accumulo-4124/core/src/test/java/org/apache/accumulo/core/client/rfile/Test.java) and when running it w/o thread seeing it take ~1300ms and with the separate write thread it takes ~760ms.  Setting up the code to use a separate write thread was done via a manual code modification to the RFileWriter class to make it use the new ThreadedFileSKVWriter class.  When it was using a seperate  write thread ran top and noticed the java process was using 200% CPU.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org