You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@accumulo.apache.org by GitBox <gi...@apache.org> on 2021/09/13 23:11:28 UTC

[GitHub] [accumulo] keith-turner commented on a change in pull request #2269: Improve BatchWriter throughput

keith-turner commented on a change in pull request #2269:
URL: https://github.com/apache/accumulo/pull/2269#discussion_r707776640



##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -201,47 +202,25 @@ public TabletServerBatchWriter(ClientContext context, BatchWriterConfig config)
     this.context = context;
     this.executor =
         ThreadPools.createGeneralScheduledExecutorService(this.context.getConfiguration());
-    this.failedMutations = new FailedMutations();
     this.maxMem = config.getMaxMemory();
     this.maxLatency = config.getMaxLatency(TimeUnit.MILLISECONDS) <= 0 ? Long.MAX_VALUE
         : config.getMaxLatency(TimeUnit.MILLISECONDS);
     this.timeout = config.getTimeout(TimeUnit.MILLISECONDS);
-    this.mutations = new MutationSet();
-    this.lastProcessingStartTime = System.currentTimeMillis();
+    this.incomingQueue = new MutationSet();
     this.durability = config.getDurability();
-
     this.writer = new MutationWriter(config.getMaxWriteThreads());
+  }
 
-    if (this.maxLatency != Long.MAX_VALUE) {
-      executor.scheduleWithFixedDelay(Threads.createNamedRunnable("BatchWriterLatencyTimer", () -> {
-        try {
-          synchronized (TabletServerBatchWriter.this) {
-            if ((System.currentTimeMillis() - lastProcessingStartTime)
-                > TabletServerBatchWriter.this.maxLatency)
-              startProcessing();
-          }
-        } catch (Exception e) {
-          updateUnknownErrors("Max latency task failed " + e.getMessage(), e);
-        }
-      }), 0, this.maxLatency / 4, TimeUnit.MILLISECONDS);
+  private void decrementMemUsed(long amount) {
+    totalMemUsed.getAndAdd(-1 * amount);
+    synchronized (this) {
+      this.notifyAll();
     }
   }
 
-  private synchronized void startProcessing() {
-    if (mutations.getMemoryUsed() == 0)
-      return;
-    lastProcessingStartTime = System.currentTimeMillis();
-    writer.queueMutations(mutations);
-    mutations = new MutationSet();
-  }
+  public void addMutation(final TableId table, final Mutation m) throws MutationsRejectedException {

Review comment:
       I am thinking just sticking with synchronized for the method, and avoiding instance individual vars that support concurrency , make its much easier reason about operations across multiple instance vars.  For example in this method there is a check `if(closed){...}`, however since the method is not syncronized it could be closed at any point in the method.  Is the code still correct for the closed check?  Not sure, but its harder to reason about.
   
   If the synchronized blocks of code only wrap code that manipulates in mem structs, then they should all be extremely fast relative to actually writing mutations to a tserver.  So w/ that in mind I am thinking the synchronized blocks are easy to reason about and will not be a bottleneck relative to the overall operation of the batch writer.  I think we could still avoid the wait till mem is half full behavior while still using sync block.

##########
File path: core/src/main/java/org/apache/accumulo/core/clientImpl/TabletServerBatchWriter.java
##########
@@ -753,15 +721,16 @@ private void addMutations(MutationSet mutationsToSend) {
         log.trace(String.format("Started sending %,d mutations to %,d tablet servers", count,
             binnedMutations.keySet().size()));
 
-      // randomize order of servers
-      ArrayList<String> servers = new ArrayList<>(binnedMutations.keySet());
-      Collections.shuffle(servers);
+      // order servers by decreasing number of mutations
+      TreeMap<Integer,String> servers = new TreeMap<>();
+      binnedMutations.forEach((server, tsm) -> servers.put(tsm.getMutations().size(), server));

Review comment:
       The more I think about this I am not sure if its a good ideal.  Its probably a good thing for a single batch writer writing to many tservers.  However for many batch writers it seems like it could lead to a degenerate case where all the batch writer write to a few tservers, where each batch writer randomly ordering tservers would avoid this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscribe@accumulo.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org