You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2020/09/10 22:45:36 UTC

[GitHub] [druid] mghosh4 opened a new pull request #10379: Adding more worker metrics to Druid Overlord

mghosh4 opened a new pull request #10379:
URL: https://github.com/apache/druid/pull/10379


   Fixes #10378 .
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10379: Adding more task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r489123722



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskRunner.java
##########
@@ -121,4 +121,17 @@ default TaskLocation getTaskLocation(String taskId)
    * @return ScalingStats if the runner has an underlying resource which can scale, Optional.absent() otherwise
    */
   Optional<ScalingStats> getScalingStats();
+
+  /**
+   * APIs useful for emitting statistics for @TaskSlotCountStatsMonitor
+  */
+  long getTotalTaskSlotCount();

Review comment:
       How about adding new interfaces such as `TaskSlotCountable` and `WorkerCountable`? Then, we can avoid implementing unnecessary interfaces, e.g., the new methods in `SingleThreadedBackgroundRunner`, by implementing required interfaces only.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jon-wei merged pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
jon-wei merged pull request #10379:
URL: https://github.com/apache/druid/pull/10379


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] mghosh4 commented on a change in pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
mghosh4 commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r493974286



##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsMonitor.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.inject.Inject;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+public class TaskSlotCountStatsMonitor extends AbstractMonitor
+{
+  private final TaskSlotCountStatsProvider statsProvider;
+
+  @Inject
+  public TaskSlotCountStatsMonitor(
+          TaskSlotCountStatsProvider statsProvider
+  )
+  {
+    this.statsProvider = statsProvider;
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter)
+  {
+    emit(emitter, "taskSlot/total/count", statsProvider.getTotalTaskSlotCount());

Review comment:
       This is controlled from TaskMaster and it is the same design as `TaskCountStatsMonitor`. If you check that code, `taskRunner` is only set if the current overlord is the leader https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskMaster.java#L113. Then check the implementation of `getSuccessfulTaskCount`. It only returns a value if the `taskRunner` is set correctly or else `null`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r494511738



##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsProvider.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+public interface TaskSlotCountStatsProvider
+{
+  /**
+   * Return the number of total task slots during emission period.
+   */
+  Long getTotalTaskSlotCount();

Review comment:
       Please annotate the return value with `@Nullable`. Same comment for other methods.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r493981418



##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsMonitor.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.inject.Inject;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+public class TaskSlotCountStatsMonitor extends AbstractMonitor
+{
+  private final TaskSlotCountStatsProvider statsProvider;
+
+  @Inject
+  public TaskSlotCountStatsMonitor(
+          TaskSlotCountStatsProvider statsProvider
+  )
+  {
+    this.statsProvider = statsProvider;
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter)
+  {
+    emit(emitter, "taskSlot/total/count", statsProvider.getTotalTaskSlotCount());

Review comment:
       Hmm, if I'm reading code correctly, `TaskMaster.getTotalTaskSlotCount()` still returns 0 even when it's not a leader. Then, the monitor will emit meaningless 0s for non-leaders.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r494511738



##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsProvider.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+public interface TaskSlotCountStatsProvider
+{
+  /**
+   * Return the number of total task slots during emission period.
+   */
+  Long getTotalTaskSlotCount();

Review comment:
       Please annotate the return value with `@Nullable`. Same comment for other methods.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r493935973



##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsMonitor.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.inject.Inject;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+public class TaskSlotCountStatsMonitor extends AbstractMonitor
+{
+  private final TaskSlotCountStatsProvider statsProvider;
+
+  @Inject
+  public TaskSlotCountStatsMonitor(
+          TaskSlotCountStatsProvider statsProvider
+  )
+  {
+    this.statsProvider = statsProvider;
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter)
+  {
+    emit(emitter, "taskSlot/total/count", statsProvider.getTotalTaskSlotCount());

Review comment:
       I think this monitor should emit metrics only when `TaskMaster` is a leader. Check out what `TaskCountStatsMonitor` does. Probably new interfaces in `TaskSlotCountStatsProvider` should be able to return null when `TaskMaster` is not a leader.

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/hrtr/HttpRemoteTaskRunner.java
##########
@@ -1342,6 +1343,13 @@ public TaskLocation getTaskLocation(String taskId)
     ).collect(Collectors.toList());
   }
 
+  public Collection<ImmutableWorkerInfo> getBlackListedWorkers()
+  {
+    synchronized (blackListedWorkers) {

Review comment:
       It seems OK without this synchronization.

##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsProvider.java
##########
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+public interface TaskSlotCountStatsProvider
+{
+  /**
+   * Return the number of total task slots during emission period.
+   */
+  long getTotalTaskSlotCount();
+
+  /**
+   * Return the number of idle task slots during emission period.
+   */
+  long getIdleTaskSlotCount();
+
+  /**
+   * Return the number of used task slots during emission period.
+   */
+  long getUsedTaskSlotCount();
+
+  /**
+   * Return the number of lazy task slots during emission period.
+   */
+  long getLazyTaskSlotCount();
+
+  /**
+   * Return the number of blacklisted task slots during emission period.
+   */
+  long getBlacklistedTaskSlotCount();

Review comment:
       Please clarify the Javadoc of these methods too. It should be the number of task slots of lazy/blacklisted middleManagers and indexers.

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/RemoteTaskRunner.java
##########
@@ -867,6 +857,18 @@ private boolean tryAssignTask(final Task task, final RemoteTaskRunnerWorkItem ta
     }
   }
 
+  Map<String, ImmutableWorkerInfo> getWorkersEligibleToRunTasks()

Review comment:
       Even though this method was always called under a lock on `workerWithUnacknowledgedTask` before, this change seems OK because it will still be called under the same lock in `tryAssignTask`, and the new caller `getIdleTaskSlotCount` doesn't seem to require locking (as it doesn't update the `workerWithUnacknowledgedTask` map.

##########
File path: docs/operations/metrics.md
##########
@@ -186,6 +186,11 @@ Note: If the JVM does not support CPU time measurement for the current thread, i
 |`task/running/count`|Number of current running tasks. This metric is only available if the TaskCountStatsMonitor module is included.|dataSource.|Varies.|
 |`task/pending/count`|Number of current pending tasks. This metric is only available if the TaskCountStatsMonitor module is included.|dataSource.|Varies.|
 |`task/waiting/count`|Number of current waiting tasks. This metric is only available if the TaskCountStatsMonitor module is included.|dataSource.|Varies.|
+|`taskSlot/total/count`|Number of total task slots per emission period. This metric is only available if the TaskSlotCountStatsMonitor module is included.| |Varies.|
+|`taskSlot/idle/count`|Number of idle task slots per emission period. This metric is only available if the TaskSlotCountStatsMonitor module is included.| |Varies.|
+|`taskSlot/used/count`|Number of busy task slots per emission period. This metric is only available if the TaskSlotCountStatsMonitor module is included.| |Varies.|
+|`taskSlot/lazy/count`|Number of lazy task slots per emission period. This metric is only available if the TaskSlotCountStatsMonitor module is included.| |Varies.|
+|`taskSlot/blacklisted/count`|Number of blacklisted task slots per emission period. This metric is only available if the TaskSlotCountStatsMonitor module is included.| |Varies.|

Review comment:
       I left a same comment in https://github.com/apache/druid/issues/10378#issuecomment-698005593. Please clarify the description on these metrics.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] mghosh4 commented on a change in pull request #10379: Adding task slot count metrics to Druid Overlord

Posted by GitBox <gi...@apache.org>.
mghosh4 commented on a change in pull request #10379:
URL: https://github.com/apache/druid/pull/10379#discussion_r493974286



##########
File path: server/src/main/java/org/apache/druid/server/metrics/TaskSlotCountStatsMonitor.java
##########
@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.server.metrics;
+
+import com.google.inject.Inject;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+import org.apache.druid.java.util.emitter.service.ServiceMetricEvent;
+import org.apache.druid.java.util.metrics.AbstractMonitor;
+
+public class TaskSlotCountStatsMonitor extends AbstractMonitor
+{
+  private final TaskSlotCountStatsProvider statsProvider;
+
+  @Inject
+  public TaskSlotCountStatsMonitor(
+          TaskSlotCountStatsProvider statsProvider
+  )
+  {
+    this.statsProvider = statsProvider;
+  }
+
+  @Override
+  public boolean doMonitor(ServiceEmitter emitter)
+  {
+    emit(emitter, "taskSlot/total/count", statsProvider.getTotalTaskSlotCount());

Review comment:
       This is controlled from TaskMaster and it is the same design as `TaskCountStatsMonitor`. If you check that code, taskRunner is only set if the current overlord is the leader https://github.com/apache/druid/blob/master/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskMaster.java#L113. Then check the implementation of `getSuccessfulTaskCount`. It only returns a value if the `taskRunner` is set correctly or else `null`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org