You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/12/23 21:59:36 UTC

[GitHub] [druid] jasonk000 opened a new pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

jasonk000 opened a new pull request #12097:
URL: https://github.com/apache/druid/pull/12097


   ### Description
   
   Improve the performance of `TaskQueue::manage` on large installations by removing unnecessary String concat / construction.
   
   Once the changes in #12096 are applied, the performance of the `TaskQueue` loop becomes much tighter, and is dominated by the logging calls.
   
   Screenshot of profiler showing `TaskQueue-Mana...` thread:
   ![image](https://user-images.githubusercontent.com/3196528/147293850-fce1bff4-9205-4a45-9f1f-44b09a1612f5.png)
   
   Screenshot of profile flamegraph for this thread, highlighting the `format` calls in the stack:
   ![image](https://user-images.githubusercontent.com/3196528/147293905-2ddeeede-4bf8-4619-be89-224c69c08ab4.png)
   
   During the task clean up loop, the `shutdown()` call is issued multiple times to the `RemoteTaskRunner` - note especially it uses the three-argument invocation of `shutdown()` method:
   
   https://github.com/apache/druid/blob/476d0bf4be4199e97695bd568d165cda98523d37/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java#L328-L335
   
   This hits the default implementation, which constructs a "reason" argument and passes it on to the two-argument `shutdown()`:
   
   https://github.com/apache/druid/blob/476d0bf4be4199e97695bd568d165cda98523d37/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java#L91-L94
   
   In particular on large clusters with thousands of tasks, this `default void shutdown` call performs a `StringUtils.format` on a task set with thousands of task IDs, and it serialises all of them to a String.
   
   Even in the event this log line is turned off, the log is still constructed, and then discarded.
   
   ### Key changed/added classes in this PR
   
   This introduces an `@Override public void shutdown()` with arguments, and has it perform the log construction and issue only if the info level is enabled. This results in significantly lower CPU consumption in this loop.
   
   This follows up the changes in #12096, and also follows the mailing list discussion here:
   https://lists.apache.org/thread/9jgdwrodwsfcg98so6kzfhdmn95gzyrj
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [x] been tested in a test Druid cluster (as part of a larger block of changes).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 edited a comment on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 edited a comment on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058768666


   Oops my bad, we had that fixed last night but I broke it again. Sorry!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 edited a comment on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 edited a comment on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1000674887


   Some extra data, ..
   
   With reference to the TaskQueue-Manager thread, a capture with info enabled:
   ![image](https://user-images.githubusercontent.com/3196528/147322727-b26a0755-cf27-4dd4-99cc-fab3b50672a2.png)
   
   With respect to this thread only, the `TaskRunner::shutdown` call is 82% of CPU in the `manage()` loop, and, the useful part of the loop, `findWorkerRunningTask` consumes ~18%, the rest is logging related. By (1) turning off INFO logging on this path, and (2) avoiding the `StringUtils.format()` call, we get approx 4x more capacity through the `TaskQueue-Manager` thread.
   ![image](https://user-images.githubusercontent.com/3196528/147322939-71d3f6c6-2cff-4506-a82b-696b6d3c8c1f.png)
   
   In other words, if you take the image just above ^, and eliminate the `Logger::info` and `StringUtils::format` paths, it opens a lot of capacity through this loop.
   
   This becomes more obvious when coupled to #12096 and the yet-to-be-raised PR to reduce contention on the TaskQueue loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058518405


   Looking at the CI failures, it looks like the build timed out because `HashPartitionMultiPhaseParallelIndexingTest` never completed. To be double sure, I ran the test locally and it completed fine. I am going to go ahead and merge this PR since it looks like tests have been flaky in general off late and I don't think logging changes introduced by this PR can contribute to the failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] lgtm-com[bot] commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
lgtm-com[bot] commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1000557505


   This pull request **introduces 1 alert** when merging 6a5bb4b2781535ca60c1eda9913e8e2b926679b6 into 476d0bf4be4199e97695bd568d165cda98523d37 - [view on LGTM.com](https://lgtm.com/projects/g/apache/druid/rev/pr-99d3edfd434252a3a144278e967a6a962e192642)
   
   **new alerts:**
   
   * 1 for Implicit conversion from array to string


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1054774274


   Hi @FrankChen021 I moved the detailed log output to debug instead of info; I think this is a better balance. Let me know your thoughts.
   
   In 5685eafeb4.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058748780


   Looks like the last rebase you pushed reverted your String.format changes, @jasonk000 . I will submit a PR to fix CI, @clintropolis 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r818202653



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";

Review comment:
       thanks @samarthjain , this solution is much simpler - fixes applied
   
   let me know if all good and i can squash it




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058768666


   Oops my bad, i thought we had that fixed last night. Sorry!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r818152577



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";
+    shutdown(taskid, reason);
   }
 
+  /**
+   * Get the logger. Not expected to be called by consumers.
+   */
+  Logger getLogger();

Review comment:
       I would probably change the default implementation to throw an `UnsupportedOperationException`. This will also have the benefit of not having to change classes implementing `TaskRunner` interface where they don't have a logger available (like test classes). 

##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskRunnerTest.java
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.overlord;
+
+import com.google.common.base.Optional;
+import com.google.common.util.concurrent.ListenableFuture;
+import org.apache.druid.indexer.TaskStatus;
+import org.apache.druid.indexing.common.task.Task;
+import org.apache.druid.indexing.overlord.autoscaling.ScalingStats;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Executor;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+public class TaskRunnerTest
+{
+  @Test
+  public void testShutdownWithInfoOnConstructsString()
+  {
+    final ToStringMock expensiveObject = new ToStringMock();
+
+    Logger infoLogger = EasyMock.createMock(Logger.class);
+    EasyMock.expect(infoLogger.isDebugEnabled()).andReturn(true).atLeastOnce();
+    EasyMock.replay(infoLogger);
+
+    MockTaskRunner taskRunner = new MockTaskRunner(infoLogger);
+    taskRunner.shutdown("taskid", "reason %s", new Object[]{expensiveObject});
+
+    Assert.assertTrue(expensiveObject.getWasCalled());
+  }
+
+  @Test
+  public void testShutdownWithInfoOffDoesNotCallToString()
+  {
+    final ToStringMock expensiveObject = new ToStringMock();
+
+    Logger infoLogger = EasyMock.createMock(Logger.class);
+    EasyMock.expect(infoLogger.isDebugEnabled()).andReturn(false).atLeastOnce();
+    EasyMock.replay(infoLogger);
+
+    MockTaskRunner taskRunner = new MockTaskRunner(infoLogger);
+    taskRunner.shutdown("taskid", "reason %s", new Object[]{expensiveObject});
+
+    Assert.assertFalse(expensiveObject.getWasCalled());
+  }
+
+  /**
+   * EasyMock does not support mocking of toString, so this provides a custom
+   * object implementation to track whether toString was called.
+   */
+  public static class ToStringMock
+  {
+    AtomicBoolean wasCalled = new AtomicBoolean(false);
+
+    public boolean getWasCalled()

Review comment:
       nit: maybe a better method name would be `wasCalled()` ? 

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";

Review comment:
       Similarly, this - 
   https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java#L299

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";

Review comment:
       Looking at the callers who pass in taskIds, I see only TaskQueue#shutdown() that is passing in the task ids. So, I propose we modify code in TaskQueue#shutdown something like this:
   ```
   // Kill tasks that shouldn't be running
       final Set<String> knownTaskIds = tasks
           .stream()
           .map(Task::getId)
           .collect(Collectors.toSet());
       final Set<String> tasksToKill = Sets.difference(runnerTaskFutures.keySet(), knownTaskIds);
       if (!tasksToKill.isEmpty()) {
         log.info("Asking taskRunner to clean up %,d tasks.", tasksToKill.size());
         // On large installations running several thousands of tasks,
         // concatenating the list of known task ids can be compupationally expensive.
         boolean logKnownTaskIds = log.isDebugEnabled();
         String reason = logKnownTaskIds
                         ? String.format("Task is not in knownTaskIds[%s]", knownTaskIds)
                         : "Task is not in knownTaskIds";
         for (final String taskId : tasksToKill) {
           try {
             taskRunner.shutdown(
                 taskId, reason
             );
           }
           catch (Exception e) {
             log.warn(e, "TaskRunner failed to clean up task: %s", taskId);
           }
         }
       }
   ```
   
   If we do this, then you won't have to add a new `getLogger()` method in `TaskRunner` interface. 

##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskRunnerTest.java
##########
@@ -0,0 +1,205 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.indexing.overlord;
+
+import com.google.common.base.Optional;
+import com.google.common.util.concurrent.ListenableFuture;
+import org.apache.druid.indexer.TaskStatus;
+import org.apache.druid.indexing.common.task.Task;
+import org.apache.druid.indexing.overlord.autoscaling.ScalingStats;
+import org.apache.druid.java.util.common.Pair;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.easymock.EasyMock;
+import org.junit.Assert;
+import org.junit.Test;
+
+import java.util.Collection;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.Executor;
+import java.util.concurrent.atomic.AtomicBoolean;
+
+public class TaskRunnerTest
+{
+  @Test
+  public void testShutdownWithInfoOnConstructsString()
+  {
+    final ToStringMock expensiveObject = new ToStringMock();
+
+    Logger infoLogger = EasyMock.createMock(Logger.class);
+    EasyMock.expect(infoLogger.isDebugEnabled()).andReturn(true).atLeastOnce();
+    EasyMock.replay(infoLogger);
+
+    MockTaskRunner taskRunner = new MockTaskRunner(infoLogger);
+    taskRunner.shutdown("taskid", "reason %s", new Object[]{expensiveObject});
+
+    Assert.assertTrue(expensiveObject.getWasCalled());
+  }
+
+  @Test
+  public void testShutdownWithInfoOffDoesNotCallToString()
+  {
+    final ToStringMock expensiveObject = new ToStringMock();
+
+    Logger infoLogger = EasyMock.createMock(Logger.class);
+    EasyMock.expect(infoLogger.isDebugEnabled()).andReturn(false).atLeastOnce();
+    EasyMock.replay(infoLogger);
+
+    MockTaskRunner taskRunner = new MockTaskRunner(infoLogger);
+    taskRunner.shutdown("taskid", "reason %s", new Object[]{expensiveObject});
+
+    Assert.assertFalse(expensiveObject.getWasCalled());
+  }
+
+  /**
+   * EasyMock does not support mocking of toString, so this provides a custom
+   * object implementation to track whether toString was called.
+   */
+  public static class ToStringMock

Review comment:
       nit: We should probably have these inner classes `ToStringMock` and `MockTaskRunner` as private static. 

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";

Review comment:
       If debug logging is not enabled, then the following log line won't be reported. 
   https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java#L582




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain merged pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain merged pull request #12097:
URL: https://github.com/apache/druid/pull/12097


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1000600708


   Since the `reason` is passed to `shutdown` only for logging, why do we do what you have changed in the interface so that all subclasses gain this improvement?
   
   Something ike this:
   
   ```
   Logger getLogger();
   
   default void shutdown(String taskid, String reasonFormat, Object... args) 
    { 
      getLogger().info("Shutdown [%s] because: [" + reasonFormat + "]" + reasonFormat, taskid, args);
      shutdown(taskid); 
    } 
   ```
   
   
   Another thing that confuses me is that the flamechart shows that there's some bottlenecks at the logging, not the `StringUtils.format`, so to gain the performance improvement, the right thing here to do is to change the logging level to WARN or ERROR right? What you solved here is just to eliminate some string construction under WARN or ERROR logging level, right? 
   
   In another word, at INFO logging level, the gain seems not significant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r774835850



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
##########
@@ -553,11 +553,33 @@ public boolean isWorkerRunningTask(ZkWorker worker, String taskId)
    * Finds the worker running the task and forwards the shutdown signal to the worker.
    *
    * @param taskId - task id to shutdown
+   * @param reasonFormat - a StringUtils reason string for shutdown
+   * @param reasonArgs - args will be used during constructio of reason log line
+   */
+  @Override
+  public void shutdown(final String taskId, String reasonFormat, Object... reasonArgs)
+  {
+    if (log.isInfoEnabled()) {

Review comment:
       Checking explicitly avoids the extra `+` string concatenation in the case that info() is not enabled. This is minor, but while we are here...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1001778350


   Capturing some test case results from #12099 we have more concrete example:
   
   Alternating between these two lines:
   ```
     default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
     {
       // String reason = (getLogger().isInfoEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : null;
       String reason = StringUtils.format(reasonFormat, reasonArgs);
       shutdown(taskid, reason);
     }
   ```
   The existing approach to always format the message completes the TaskQueueScaleTest in 15.x seconds:
   ```
   [INFO] Running org.apache.druid.indexing.overlord.TaskQueueScaleTest
   [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 15.553 s - in org.apache.druid.indexing.overlord.TaskQueueScaleTest
   ```
   Using the new way to only format if necessary completes the test in 6.x seconds:
   ```
   [INFO] Running org.apache.druid.indexing.overlord.TaskQueueScaleTest
   [INFO] Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 6.529 s - in org.apache.druid.indexing.overlord.TaskQueueScaleTest
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r775693828



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,17 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    String reason = (getLogger().isInfoEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : null;

Review comment:
       fixed in 7c563eac03 and fb3cf82664




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r818168065



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";

Review comment:
       Looking at the callers, I see only TaskQueue#shutdown() that is passing in the task ids. So, I propose we modify code in TaskQueue#manageInternal() to something like this:
   ```
   // Kill tasks that shouldn't be running
       final Set<String> knownTaskIds = tasks
           .stream()
           .map(Task::getId)
           .collect(Collectors.toSet());
       final Set<String> tasksToKill = Sets.difference(runnerTaskFutures.keySet(), knownTaskIds);
       if (!tasksToKill.isEmpty()) {
         log.info("Asking taskRunner to clean up %,d tasks.", tasksToKill.size());
         // On large installations running several thousands of tasks,
         // concatenating the list of known task ids can be compupationally expensive.
         boolean logKnownTaskIds = log.isDebugEnabled();
         String reason = logKnownTaskIds
                         ? String.format("Task is not in knownTaskIds[%s]", knownTaskIds)
                         : "Task is not in knownTaskIds";
         for (final String taskId : tasksToKill) {
           try {
             taskRunner.shutdown(
                 taskId, reason
             );
           }
           catch (Exception e) {
             log.warn(e, "TaskRunner failed to clean up task: %s", taskId);
           }
         }
       }
   ```
   
   If we do this, then you won't have to add a new `getLogger()` method in `TaskRunner` interface. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058752068


   Submitted https://github.com/apache/druid/pull/12304 to fix CI.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] clintropolis commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
clintropolis commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058747362


   this PR has broken the ci, forbidden api check is failing, `String.format` shouldn't be called directly, use `StringUtils.format` instead


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 edited a comment on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 edited a comment on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1000674887


   Some extra data, ..
   
   With reference to the TaskQueue-Manager thread, a capture with info enabled:
   ![image](https://user-images.githubusercontent.com/3196528/147322727-b26a0755-cf27-4dd4-99cc-fab3b50672a2.png)
   
   With respect to this thread only, the `TaskRunner::shutdown` call is 82% of CPU in the `manage()` loop, and, the useful part of the loop, `findWorkerRunningTask` consumes ~18%, the rest is logging related. By (1) turning off INFO logging on this path, and (2) avoiding the `StringUtils.format()` call, we get approx 4x more capacity through the `TaskQueue-Manager` thread.
   ![image](https://user-images.githubusercontent.com/3196528/147322939-71d3f6c6-2cff-4506-a82b-696b6d3c8c1f.png)
   
   <hr>
   
   ie: If you take the image just above ^, and eliminate the `Logger::info` and `StringUtils::format` paths, it opens a lot of capacity through this loop.
   
   This becomes more obvious when coupled to #12096 and the yet-to-be-raised PR to reduce contention on the TaskQueue loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1000674887


   Some extra data, ..
   
   With reference to the TaskQueue-Manager thread, a capture with info enabled:
   ![image](https://user-images.githubusercontent.com/3196528/147322727-b26a0755-cf27-4dd4-99cc-fab3b50672a2.png)
   
   With respect to this thread only, the `TaskRunner::shutdown` call is 82% of CPU in the `manage()` loop, and, the useful part of the loop, `findWorkerRunningTask` consumes ~18%, the rest is logging related. By (1) turning off INFO logging on this path, and (2) avoiding the `StringUtils.format()` call, we get approx 4x more capacity through the `TaskQueue-Manager` thread.
   ![image](https://user-images.githubusercontent.com/3196528/147322939-71d3f6c6-2cff-4506-a82b-696b6d3c8c1f.png)
   
   In other words, if you take the image just above ^, and eliminate the `Logger::info=` and `StringUtils::format` paths, it opens a lot of capacity through this loop.
   
   This becomes more obvious when coupled to #12096 and the yet-to-be-raised PR to reduce contention on the TaskQueue loop.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] samarthjain commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
samarthjain commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1057522274


   +1, looks good. I will merge after CI completes. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r818173605



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,19 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    // only calculate the 'reason' string for debug level logging
+    // in large clusters the 'reasonArgs' may be very large / expensive if it includes a list of tasks
+    String reason = (getLogger().isDebugEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : "debug log disabled";

Review comment:
       I like this solution, I'll swap over to this implementation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 edited a comment on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 edited a comment on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1058768666


   Oops my bad, we had that fixed last night but I broke it again. Sorry! Thx @samarthjain for addressing it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jasonk000 commented on pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
jasonk000 commented on pull request #12097:
URL: https://github.com/apache/druid/pull/12097#issuecomment-1000605742


   I like your suggestion to move it up to the interface level.
   
   WRT warn / error / etc. You're right; there's two perspectives on this:
   
   1. Yes, the biggest benefit is if logging is set to WARN/ERROR. In my stack, with thousands of tasks running, I specifically set warn level for this class in log4j xml as it is noisy, expensive and provides little benefit for regular operations, but this benefit is mitigated with the current codebase.
   
   2. Even in the case of info logging being enabled, the triple string concat is unnecessary.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r774829661



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
##########
@@ -553,11 +553,33 @@ public boolean isWorkerRunningTask(ZkWorker worker, String taskId)
    * Finds the worker running the task and forwards the shutdown signal to the worker.
    *
    * @param taskId - task id to shutdown
+   * @param reasonFormat - a StringUtils reason string for shutdown
+   * @param reasonArgs - args will be used during constructio of reason log line
+   */
+  @Override
+  public void shutdown(final String taskId, String reasonFormat, Object... reasonArgs)
+  {
+    if (log.isInfoEnabled()) {

Review comment:
       no need to check if isInfoEnabled, because the `log.info` internally checks it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] FrankChen021 commented on a change in pull request #12097: perf: eliminate double string concat in remote-task-runner shutdown logging

Posted by GitBox <gi...@apache.org>.
FrankChen021 commented on a change in pull request #12097:
URL: https://github.com/apache/druid/pull/12097#discussion_r775688676



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -88,11 +89,17 @@
    */
   void shutdown(String taskid, String reason);
 
-  default void shutdown(String taskid, String reasonFormat, Object... args)
+  default void shutdown(String taskid, String reasonFormat, Object... reasonArgs)
   {
-    shutdown(taskid, StringUtils.format(reasonFormat, args));
+    String reason = (getLogger().isInfoEnabled()) ? StringUtils.format(reasonFormat, reasonArgs) : null;

Review comment:
       better to add some comments here to elaborate why we do this here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org