You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2019/09/04 00:33:07 UTC

[GitHub] [accumulo] ctubbsii commented on issue #1152: Fix #1120 - Improve batch writer throughput

ctubbsii commented on issue #1152: Fix #1120 - Improve batch writer throughput
URL: https://github.com/apache/accumulo/pull/1152#issuecomment-527691138
 
 
   > @ctubbsii, I tested my changes by running the ReadWriteIT last week and I saw some 5% speedup compared to master branch in the interleaved, sunnyLG, and sunnyDay tests on one occasion.
   
   That IT is designed to be a basic smoke test to ensure we can put data into Accumulo and it subsequently view it. It's not designed or well-suited for performance testing or experimentation.
   
   > I re-ran the tests again a few times and the running times went up and down a little from run to run so I really can't say that I have improved anything here.
   
   Variance is pretty normal. If you want to see if this code changed things in a meaningful way, you'll probably need to do some controlled experiments with various sizes and settings to demonstrate (or disprove) improved throughput with the change(s). A good experiment will also eliminate the JVM startup times as a variable by avoiding using MiniAccumuloInstance (used by ReadWriteIT), and would run with several tservers (to ensure the binning part of the batch writer is exercised, since that's a relevant code path for the question of throughput).
   
   > I will look at the TabletServerBatchWriter class a little more deeply to see if I can come up with any better ideas on how to improve throughput. I will comment on issue #1120 in the issue thread instead of the pull request from now on.
   
   The ideas in this PR don't seem to be very similar to the 2-layer queuing strategy described in the old JIRA issue linked on #1120. Have you attempted to implement that design proposed in that ticket at all? It could be implemented alongside the existing BatchWriter, as a new API, instead of modifying the current one. And, if it turns out to be higher performance, we can either swap in the new implementation, or deprecate the old API in favor of the new one.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services