You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/06/25 00:25:19 UTC

[GitHub] [incubator-pinot] mcvsubbu opened a new pull request #7091: Added TaskMetricsEmitted periodic controller job

mcvsubbu opened a new pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091


   This job runs every 5 mins by default and emits metrics about Minion tasks
   scheduled in Pinot. For now, the following metrics are emitted for each
   task type:
   
   - Number of running tasks
   - Number of running sub-tasks
   - Number of waiting sub-taks (unassigned to a minion as yet)
   - Number of error sub-tasks (completed with an error/exception)
   - Percent of sub-tasks in error
   - Percent of sub-tasks in waiting or running states.
   
   ## Description
   <!-- Add a description of your PR here.
   A good description should include pointers to an issue or design document, etc.
   -->
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu merged pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu merged pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659952371



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -249,15 +253,72 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
    * @return Map from task name to task state
    */
   public synchronized Map<String, TaskState> getTaskStates(String taskType) {
-    Map<String, TaskState> helixJobStates =
-        _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType)).getJobStates();
+    Map<String, TaskState> helixJobStates = new HashMap<>();
+    WorkflowContext workflowContext = _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
+
+    if (workflowContext == null) {
+      return helixJobStates;
+    }
+    helixJobStates = workflowContext.getJobStates();
     Map<String, TaskState> taskStates = new HashMap<>(helixJobStates.size());
     for (Map.Entry<String, TaskState> entry : helixJobStates.entrySet()) {
       taskStates.put(getPinotTaskName(entry.getKey()), entry.getValue());
     }
     return taskStates;
   }
 
+  /**
+   * This method returns a count of sub-tasks in various states, given the top-level task name.
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return TaskCount object
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    TaskCount taskCount = new TaskCount();
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));
+
+    if (jobContext == null) {
+      return taskCount;
+    }
+    Set<Integer> partitionSet = jobContext.getPartitionSet();
+    taskCount.addToTotal(partitionSet.size());
+    for (int partition : partitionSet) {
+      TaskPartitionState state = jobContext.getPartitionState(partition);
+      // Helix returns state as null if the task is not enqueued anywhere yet
+      if (state == null) {
+        // task is not yet assigned to a participant
+        taskCount.addToWaiting(1);
+      } else if (state.equals(TaskPartitionState.INIT) || state.equals(TaskPartitionState.RUNNING)) {
+        taskCount.addToRunning(1);
+      } else if (state.equals(TaskPartitionState.TASK_ERROR)) {
+        taskCount.addToError(1);
+      }
+    }
+    return taskCount;
+  }
+
+  /**
+   * Returns a set of Task names (in the form "Task_TestTask_1624403781879") that are in progress or not started yet.
+   *
+   * @param taskType
+   * @return Set of task names
+   */
+  public synchronized Set<String> getTasksInProgress(String taskType) {
+    Set<String> tasksInProgress = new HashSet<>();
+    WorkflowContext workflowContext = _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
+    if (workflowContext == null) {
+      return tasksInProgress;

Review comment:
       It'd be good to log a warning message here if the context is null.

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,20 +85,44 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    // No tasks before we start.
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
-
-    // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
-
-    // Should not generate more tasks
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
+
+    // Since we have two tables, two sub-tasks are generated -- one for each table.
+    // The default concurrent sub-tasks per minion instance is 1, and we have one minion
+    // instance spun up. So, one sub-tasks gets scheduled in a minion, and the other one
+    // waits.
+    verifyTaskCount(task1, 0, 1, 1, 2);
+    // Should generate one more task, with two sub-tasks. Both of these sub-tasks will wait
+    // since we have one minion instance that is still running one of the sub-tasks.

Review comment:
       If the only one minion instance is still running one of the sub-tasks, why does the `runningCount` below show 0?

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,18 +85,34 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
 
+    verifyTaskCount(task1, 0, 1, 1, 2);
     // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
+    String task2 = _taskManager.scheduleTask(TASK_TYPE);
+    assertNotNull(task2);
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task2));
+    verifyTaskCount(task2, 0, 2, 0, 2);

Review comment:
       Is it true that the `totalCount` here is the total number of sub-tasks?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659239746



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method returns a count of sub-tasks in various states, given the top-level task name.
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return TaskCount object
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));

Review comment:
       Will do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869078521


   > How about splitting the task related metrics based on the task type?
   
   They are already split based on task type


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r658926621



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
##########
@@ -0,0 +1,81 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.helix.core.minion;
+
+import java.util.Set;
+import org.apache.pinot.common.metrics.ControllerGauge;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.minion.PinotHelixTaskResourceManager.TaskCount;
+import org.apache.pinot.core.periodictask.BasePeriodicTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class TaskMetricsEmitter extends BasePeriodicTask {
+  private static final Logger LOGGER = LoggerFactory.getLogger(TaskMetricsEmitter.class);
+  private final static String TASK_NAME = "TaskMetricsEmitter";
+
+  private final PinotHelixTaskResourceManager _helixTaskResourceManager;
+  private final ControllerMetrics _controllerMetrics;
+  private final LeadControllerManager _leadControllerManager;
+
+  public TaskMetricsEmitter(PinotHelixTaskResourceManager helixTaskResourceManager,
+      LeadControllerManager leadControllerManager, ControllerConf controllerConf, ControllerMetrics controllerMetrics) {
+    super(TASK_NAME, controllerConf.getTaskMetricsEmitterFrequencyInSeconds(),
+        controllerConf.getPeriodicTaskInitialDelayInSeconds());
+    _helixTaskResourceManager = helixTaskResourceManager;
+    _controllerMetrics = controllerMetrics;
+    _leadControllerManager = leadControllerManager;
+  }
+
+  @Override
+  protected final void runTask() {
+    // Make it so that only one controller returns the metric for all the tasks.
+    if (!_leadControllerManager.isLeaderForTable(TASK_NAME)) {
+      return;
+    }
+
+    // The call to get task types can take time if there are a lot of tasks.
+    // Potential optimization is to call it every (say) 30m if we detect a barrage of
+    // zk requests.
+    Set<String> taskTypes = _helixTaskResourceManager.getTaskTypes();
+    for (String taskType : taskTypes) {
+      TaskCount accumulated = new TaskCount();
+      Set<String> tasksInProgress = _helixTaskResourceManager.getTasksInProgress(taskType);
+      int numRunningTasks = 0;
+      for (String task : tasksInProgress) {
+        numRunningTasks++;

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659241012



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
##########
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.helix.core.minion;
+
+import java.util.Set;
+import org.apache.pinot.common.metrics.ControllerGauge;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.minion.PinotHelixTaskResourceManager.TaskCount;
+import org.apache.pinot.core.periodictask.BasePeriodicTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class TaskMetricsEmitter extends BasePeriodicTask {

Review comment:
       Will do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869089682


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7091](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fd2ae8f) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/ac8f098e8ff883abb8323391480039f3a5b8aab7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac8f098) will **decrease** coverage by `8.02%`.
   > The diff coverage is `6.45%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/7091/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7091      +/-   ##
   ============================================
   - Coverage     73.49%   65.47%   -8.03%     
     Complexity       91       91              
   ============================================
     Files          1492     1493       +1     
     Lines         73422    73511      +89     
     Branches      10574    10587      +13     
   ============================================
   - Hits          53960    48129    -5831     
   - Misses        15927    21982    +6055     
   + Partials       3535     3400     -135     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `?` | |
   | unittests | `65.47% <6.45%> (-0.08%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/controller/ControllerConf.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9Db250cm9sbGVyQ29uZi5qYXZh) | `49.10% <0.00%> (-4.08%)` | :arrow_down: |
   | [...lix/core/minion/PinotHelixTaskResourceManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdEhlbGl4VGFza1Jlc291cmNlTWFuYWdlci5qYXZh) | `2.94% <0.00%> (-80.02%)` | :arrow_down: |
   | [...controller/helix/core/minion/PinotTaskManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdFRhc2tNYW5hZ2VyLmphdmE=) | `4.81% <0.00%> (-23.23%)` | :arrow_down: |
   | [...ntroller/helix/core/minion/TaskMetricsEmitter.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9UYXNrTWV0cmljc0VtaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...g/apache/pinot/common/metrics/ControllerGauge.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyR2F1Z2UuamF2YQ==) | `97.22% <100.00%> (+0.55%)` | :arrow_up: |
   | [...a/org/apache/pinot/minion/metrics/MinionMeter.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vbWV0cmljcy9NaW5pb25NZXRlci5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/common/metrics/BrokerQueryPhase.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Ccm9rZXJRdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [.../apache/pinot/minion/metrics/MinionQueryPhase.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vbWV0cmljcy9NaW5pb25RdWVyeVBoYXNlLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pache/pinot/common/utils/grpc/GrpcQueryClient.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdXRpbHMvZ3JwYy9HcnBjUXVlcnlDbGllbnQuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...pinot/minion/exception/TaskCancelledException.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtbWluaW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9taW5pb24vZXhjZXB0aW9uL1Rhc2tDYW5jZWxsZWRFeGNlcHRpb24uamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [348 more](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ac8f098...fd2ae8f](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659947333



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,18 +85,34 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
 
+    verifyTaskCount(task1, 0, 1, 1, 2);
     // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
+    String task2 = _taskManager.scheduleTask(TASK_TYPE);
+    assertNotNull(task2);
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task2));
+    verifyTaskCount(task2, 0, 2, 0, 2);

Review comment:
       OK, I have not changed the test conditions from before. nevertheless, I will try to document the test a bit.

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,20 +85,44 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    // No tasks before we start.
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
-
-    // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
-
-    // Should not generate more tasks
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
+
+    // Since we have two tables, two sub-tasks are generated -- one for each table.
+    // The default concurrent sub-tasks per minion instance is 1, and we have one minion
+    // instance spun up. So, one sub-tasks gets scheduled in a minion, and the other one
+    // waits.
+    verifyTaskCount(task1, 0, 1, 1, 2);
+    // Should generate one more task, with two sub-tasks. Both of these sub-tasks will wait
+    // since we have one minion instance that is still running one of the sub-tasks.

Review comment:
       Minions run sub-tasks. We queue tasks that have sub-tasks. Task1 had two sub-tasks out of which one is running in minion. Task2 had two sub-tasks, but none of them can run on minion because there is already _a_ sub-task running on minion. Therefore, task1's count shows 1 running, and task2's count shows 0 running.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu merged pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu merged pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659090404



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method returns a count of sub-tasks in various states, given the top-level task name.
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return TaskCount object
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));
+    Set<Integer> partitionSet = jobContext.getPartitionSet();
+    TaskCount taskCount = new TaskCount();
+    taskCount.addToTotal(partitionSet.size());
+    for (int partition : partitionSet) {
+      TaskPartitionState state = jobContext.getPartitionState(partition);
+      // Helix returns state as null if the task is not enqueued anywhere yet
+      if (state == null) {
+        // task is not yet assigned to a participant
+        taskCount.addToWaiting(1);
+      } else if (state.equals(TaskPartitionState.INIT) || state.equals(TaskPartitionState.RUNNING)) {
+        taskCount.addToRunning(1);
+      } else if (state.equals(TaskPartitionState.TASK_ERROR)) {
+        taskCount.addToError(1);
+      }
+    }
+    return taskCount;
+  }
+
+  /**
+   * Returns a set of Task names (in the form "Task_TestTask_1624403781879") that are in progress or not started yet.
+   *
+   * @param taskType
+   * @return Set of task names
+   */
+  public synchronized Set<String> getTasksInProgress(String taskType) {
+    Set<String> tasksInProgress = new HashSet<>();
+    Map<String, TaskState> helixJobStates =

Review comment:
       Same for the returned object here.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
##########
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.helix.core.minion;
+
+import java.util.Set;
+import org.apache.pinot.common.metrics.ControllerGauge;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.minion.PinotHelixTaskResourceManager.TaskCount;
+import org.apache.pinot.core.periodictask.BasePeriodicTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class TaskMetricsEmitter extends BasePeriodicTask {
+  private static final Logger LOGGER = LoggerFactory.getLogger(TaskMetricsEmitter.class);
+  private final static String TASK_NAME = "TaskMetricsEmitter";
+
+  private final PinotHelixTaskResourceManager _helixTaskResourceManager;
+  private final ControllerMetrics _controllerMetrics;
+  private final LeadControllerManager _leadControllerManager;
+
+  public TaskMetricsEmitter(PinotHelixTaskResourceManager helixTaskResourceManager,
+      LeadControllerManager leadControllerManager, ControllerConf controllerConf, ControllerMetrics controllerMetrics) {
+    super(TASK_NAME, controllerConf.getTaskMetricsEmitterFrequencyInSeconds(),
+        controllerConf.getPeriodicTaskInitialDelayInSeconds());
+    _helixTaskResourceManager = helixTaskResourceManager;
+    _controllerMetrics = controllerMetrics;
+    _leadControllerManager = leadControllerManager;
+  }
+
+  @Override
+  protected final void runTask() {
+    // Make it so that only one controller returns the metric for all the tasks.
+    if (!_leadControllerManager.isLeaderForTable(TASK_NAME)) {
+      return;
+    }
+
+    // The call to get task types can take time if there are a lot of tasks.

Review comment:
       It'd be good to mention the default frequency is 5 mins here.

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
##########
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.helix.core.minion;
+
+import java.util.Set;
+import org.apache.pinot.common.metrics.ControllerGauge;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.minion.PinotHelixTaskResourceManager.TaskCount;
+import org.apache.pinot.core.periodictask.BasePeriodicTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class TaskMetricsEmitter extends BasePeriodicTask {

Review comment:
       It'd be good to add some javadoc here?

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method returns a count of sub-tasks in various states, given the top-level task name.
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return TaskCount object
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));

Review comment:
       JobContext could be null according to this code from Helix:
   ```
    protected static JobContext getJobContext(HelixPropertyStore<ZNRecord> propertyStore, String jobResource) {
       ZNRecord r = (ZNRecord)propertyStore.get(Joiner.on("/").join("/TaskRebalancer", jobResource, new Object[]{"Context"}), (Stat)null, AccessOption.PERSISTENT);
       return r != null ? new JobContext(r) : null;
     }
    ```
    It'd be good to handle the null case here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869089682


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7091](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (cb90b87) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/ac8f098e8ff883abb8323391480039f3a5b8aab7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac8f098) will **decrease** coverage by `31.68%`.
   > The diff coverage is `47.31%`.
   
   > :exclamation: Current head cb90b87 differs from pull request most recent head fd2ae8f. Consider uploading reports for the commit fd2ae8f to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/7091/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7091       +/-   ##
   =============================================
   - Coverage     73.49%   41.80%   -31.69%     
   + Complexity       91        7       -84     
   =============================================
     Files          1492     1493        +1     
     Lines         73422    73511       +89     
     Branches      10574    10587       +13     
   =============================================
   - Hits          53960    30733    -23227     
   - Misses        15927    40199    +24272     
   + Partials       3535     2579      -956     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `41.80% <47.31%> (+0.29%)` | :arrow_up: |
   | unittests | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/controller/ControllerConf.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9Db250cm9sbGVyQ29uZi5qYXZh) | `44.64% <0.00%> (-8.54%)` | :arrow_down: |
   | [...ntroller/helix/core/minion/TaskMetricsEmitter.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9UYXNrTWV0cmljc0VtaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...lix/core/minion/PinotHelixTaskResourceManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdEhlbGl4VGFza1Jlc291cmNlTWFuYWdlci5qYXZh) | `79.41% <74.00%> (-3.55%)` | :arrow_down: |
   | [...g/apache/pinot/common/metrics/ControllerGauge.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyR2F1Z2UuamF2YQ==) | `97.22% <100.00%> (+0.55%)` | :arrow_up: |
   | [...controller/helix/core/minion/PinotTaskManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdFRhc2tNYW5hZ2VyLmphdmE=) | `27.77% <100.00%> (-0.27%)` | :arrow_down: |
   | [...c/main/java/org/apache/pinot/common/tier/Tier.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/spi/config/table/TableType.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1RhYmxlVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [945 more](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ac8f098...fd2ae8f](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r658926450



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method helps compute the percentage done for a task (that may have many partitions or sub-tasks).
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return a pair of integers, first one being the total number partitions and the second being number of partitions
+   * still running or yet to run.
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));
+    TaskCount taskCount = new TaskCount();
+    for (int partition : jobContext.getPartitionSet()) {
+      taskCount.addToTotal(1);

Review comment:
       Done

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method helps compute the percentage done for a task (that may have many partitions or sub-tasks).
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return a pair of integers, first one being the total number partitions and the second being number of partitions

Review comment:
       Done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659947333



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,18 +85,34 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
 
+    verifyTaskCount(task1, 0, 1, 1, 2);
     // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
+    String task2 = _taskManager.scheduleTask(TASK_TYPE);
+    assertNotNull(task2);
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task2));
+    verifyTaskCount(task2, 0, 2, 0, 2);

Review comment:
       OK, I have not changed the test conditions from before. nevertheless, I will try to document the test a bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869089682


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7091](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fb2e3de) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/ac8f098e8ff883abb8323391480039f3a5b8aab7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac8f098) will **decrease** coverage by `0.02%`.
   > The diff coverage is `47.31%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/7091/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@             Coverage Diff              @@
   ##             master    #7091      +/-   ##
   ============================================
   - Coverage     73.49%   73.46%   -0.03%     
     Complexity       91       91              
   ============================================
     Files          1492     1493       +1     
     Lines         73422    73511      +89     
     Branches      10574    10587      +13     
   ============================================
   + Hits          53960    54005      +45     
   - Misses        15927    15967      +40     
   - Partials       3535     3539       +4     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `41.56% <47.31%> (+0.05%)` | :arrow_up: |
   | unittests | `65.44% <6.45%> (-0.10%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/controller/ControllerConf.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9Db250cm9sbGVyQ29uZi5qYXZh) | `52.23% <0.00%> (-0.95%)` | :arrow_down: |
   | [...ntroller/helix/core/minion/TaskMetricsEmitter.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9UYXNrTWV0cmljc0VtaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...lix/core/minion/PinotHelixTaskResourceManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdEhlbGl4VGFza1Jlc291cmNlTWFuYWdlci5qYXZh) | `79.41% <74.00%> (-3.55%)` | :arrow_down: |
   | [...g/apache/pinot/common/metrics/ControllerGauge.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyR2F1Z2UuamF2YQ==) | `97.22% <100.00%> (+0.55%)` | :arrow_up: |
   | [...controller/helix/core/minion/PinotTaskManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdFRhc2tNYW5hZ2VyLmphdmE=) | `27.77% <100.00%> (-0.27%)` | :arrow_down: |
   | [...ller/helix/core/minion/TaskTypeMetricsUpdater.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9UYXNrVHlwZU1ldHJpY3NVcGRhdGVyLmphdmE=) | `80.00% <0.00%> (-20.00%)` | :arrow_down: |
   | [...operator/filter/RangeIndexBasedFilterOperator.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29yZS9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29yZS9vcGVyYXRvci9maWx0ZXIvUmFuZ2VJbmRleEJhc2VkRmlsdGVyT3BlcmF0b3IuamF2YQ==) | `40.42% <0.00%> (-4.26%)` | :arrow_down: |
   | [.../impl/dictionary/BaseOffHeapMutableDictionary.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc2VnbWVudC1sb2NhbC9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3Qvc2VnbWVudC9sb2NhbC9yZWFsdGltZS9pbXBsL2RpY3Rpb25hcnkvQmFzZU9mZkhlYXBNdXRhYmxlRGljdGlvbmFyeS5qYXZh) | `84.00% <0.00%> (-3.34%)` | :arrow_down: |
   | [...not/broker/broker/helix/ClusterChangeMediator.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtYnJva2VyL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9icm9rZXIvYnJva2VyL2hlbGl4L0NsdXN0ZXJDaGFuZ2VNZWRpYXRvci5qYXZh) | `74.72% <0.00%> (-2.20%)` | :arrow_down: |
   | [...lix/core/realtime/PinotRealtimeSegmentManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL3JlYWx0aW1lL1Bpbm90UmVhbHRpbWVTZWdtZW50TWFuYWdlci5qYXZh) | `78.97% <0.00%> (-2.06%)` | :arrow_down: |
   | ... and [14 more](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ac8f098...fb2e3de](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659966207



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,20 +85,44 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    // No tasks before we start.
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
-
-    // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
-
-    // Should not generate more tasks
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
+
+    // Since we have two tables, two sub-tasks are generated -- one for each table.
+    // The default concurrent sub-tasks per minion instance is 1, and we have one minion
+    // instance spun up. So, one sub-tasks gets scheduled in a minion, and the other one
+    // waits.
+    verifyTaskCount(task1, 0, 1, 1, 2);
+    // Should generate one more task, with two sub-tasks. Both of these sub-tasks will wait
+    // since we have one minion instance that is still running one of the sub-tasks.

Review comment:
       Minions run sub-tasks. We queue tasks that have sub-tasks. Task1 had two sub-tasks out of which one is running in minion. Task2 had two sub-tasks, but none of them can run on minion because there is already _a_ sub-task running on minion. Therefore, task1's count shows 1 running, and task2's count shows 0 running.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869083664


   @jackjlli  can u please take another look? I have addressed all comments and also rebased the code so there are no conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659240053



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
##########
@@ -0,0 +1,80 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.helix.core.minion;
+
+import java.util.Set;
+import org.apache.pinot.common.metrics.ControllerGauge;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.minion.PinotHelixTaskResourceManager.TaskCount;
+import org.apache.pinot.core.periodictask.BasePeriodicTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class TaskMetricsEmitter extends BasePeriodicTask {
+  private static final Logger LOGGER = LoggerFactory.getLogger(TaskMetricsEmitter.class);
+  private final static String TASK_NAME = "TaskMetricsEmitter";
+
+  private final PinotHelixTaskResourceManager _helixTaskResourceManager;
+  private final ControllerMetrics _controllerMetrics;
+  private final LeadControllerManager _leadControllerManager;
+
+  public TaskMetricsEmitter(PinotHelixTaskResourceManager helixTaskResourceManager,
+      LeadControllerManager leadControllerManager, ControllerConf controllerConf, ControllerMetrics controllerMetrics) {
+    super(TASK_NAME, controllerConf.getTaskMetricsEmitterFrequencyInSeconds(),
+        controllerConf.getPeriodicTaskInitialDelayInSeconds());
+    _helixTaskResourceManager = helixTaskResourceManager;
+    _controllerMetrics = controllerMetrics;
+    _leadControllerManager = leadControllerManager;
+  }
+
+  @Override
+  protected final void runTask() {
+    // Make it so that only one controller returns the metric for all the tasks.
+    if (!_leadControllerManager.isLeaderForTable(TASK_NAME)) {
+      return;
+    }
+
+    // The call to get task types can take time if there are a lot of tasks.

Review comment:
       I don't want to mention the number here -- if someone changes the default for some reason, they won't know to change the comment. I will add some comments to the effect that this is run fairly frequently.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r658472544



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/TaskMetricsEmitter.java
##########
@@ -0,0 +1,81 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.controller.helix.core.minion;
+
+import java.util.Set;
+import org.apache.pinot.common.metrics.ControllerGauge;
+import org.apache.pinot.common.metrics.ControllerMetrics;
+import org.apache.pinot.controller.ControllerConf;
+import org.apache.pinot.controller.LeadControllerManager;
+import org.apache.pinot.controller.helix.core.minion.PinotHelixTaskResourceManager.TaskCount;
+import org.apache.pinot.core.periodictask.BasePeriodicTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+public class TaskMetricsEmitter extends BasePeriodicTask {
+  private static final Logger LOGGER = LoggerFactory.getLogger(TaskMetricsEmitter.class);
+  private final static String TASK_NAME = "TaskMetricsEmitter";
+
+  private final PinotHelixTaskResourceManager _helixTaskResourceManager;
+  private final ControllerMetrics _controllerMetrics;
+  private final LeadControllerManager _leadControllerManager;
+
+  public TaskMetricsEmitter(PinotHelixTaskResourceManager helixTaskResourceManager,
+      LeadControllerManager leadControllerManager, ControllerConf controllerConf, ControllerMetrics controllerMetrics) {
+    super(TASK_NAME, controllerConf.getTaskMetricsEmitterFrequencyInSeconds(),
+        controllerConf.getPeriodicTaskInitialDelayInSeconds());
+    _helixTaskResourceManager = helixTaskResourceManager;
+    _controllerMetrics = controllerMetrics;
+    _leadControllerManager = leadControllerManager;
+  }
+
+  @Override
+  protected final void runTask() {
+    // Make it so that only one controller returns the metric for all the tasks.
+    if (!_leadControllerManager.isLeaderForTable(TASK_NAME)) {
+      return;
+    }
+
+    // The call to get task types can take time if there are a lot of tasks.
+    // Potential optimization is to call it every (say) 30m if we detect a barrage of
+    // zk requests.
+    Set<String> taskTypes = _helixTaskResourceManager.getTaskTypes();
+    for (String taskType : taskTypes) {
+      TaskCount accumulated = new TaskCount();
+      Set<String> tasksInProgress = _helixTaskResourceManager.getTasksInProgress(taskType);
+      int numRunningTasks = 0;
+      for (String task : tasksInProgress) {
+        numRunningTasks++;

Review comment:
       Instead of `numRunningTasks++;` in for loop, how about declare `int numRunningTasks = tasksInProgress.size()` out of the for loop?

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method helps compute the percentage done for a task (that may have many partitions or sub-tasks).
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return a pair of integers, first one being the total number partitions and the second being number of partitions
+   * still running or yet to run.
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));
+    TaskCount taskCount = new TaskCount();
+    for (int partition : jobContext.getPartitionSet()) {
+      taskCount.addToTotal(1);

Review comment:
       `taskCount.addToTotal(jobContext.getPartitionSet().size());` out of the for loop?

##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -258,6 +262,50 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
     return taskStates;
   }
 
+  /**
+   * This method helps compute the percentage done for a task (that may have many partitions or sub-tasks).
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return a pair of integers, first one being the total number partitions and the second being number of partitions

Review comment:
       Please update the javadoc of return type here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659952371



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -249,15 +253,72 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
    * @return Map from task name to task state
    */
   public synchronized Map<String, TaskState> getTaskStates(String taskType) {
-    Map<String, TaskState> helixJobStates =
-        _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType)).getJobStates();
+    Map<String, TaskState> helixJobStates = new HashMap<>();
+    WorkflowContext workflowContext = _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
+
+    if (workflowContext == null) {
+      return helixJobStates;
+    }
+    helixJobStates = workflowContext.getJobStates();
     Map<String, TaskState> taskStates = new HashMap<>(helixJobStates.size());
     for (Map.Entry<String, TaskState> entry : helixJobStates.entrySet()) {
       taskStates.put(getPinotTaskName(entry.getKey()), entry.getValue());
     }
     return taskStates;
   }
 
+  /**
+   * This method returns a count of sub-tasks in various states, given the top-level task name.
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return TaskCount object
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    TaskCount taskCount = new TaskCount();
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));
+
+    if (jobContext == null) {
+      return taskCount;
+    }
+    Set<Integer> partitionSet = jobContext.getPartitionSet();
+    taskCount.addToTotal(partitionSet.size());
+    for (int partition : partitionSet) {
+      TaskPartitionState state = jobContext.getPartitionState(partition);
+      // Helix returns state as null if the task is not enqueued anywhere yet
+      if (state == null) {
+        // task is not yet assigned to a participant
+        taskCount.addToWaiting(1);
+      } else if (state.equals(TaskPartitionState.INIT) || state.equals(TaskPartitionState.RUNNING)) {
+        taskCount.addToRunning(1);
+      } else if (state.equals(TaskPartitionState.TASK_ERROR)) {
+        taskCount.addToError(1);
+      }
+    }
+    return taskCount;
+  }
+
+  /**
+   * Returns a set of Task names (in the form "Task_TestTask_1624403781879") that are in progress or not started yet.
+   *
+   * @param taskType
+   * @return Set of task names
+   */
+  public synchronized Set<String> getTasksInProgress(String taskType) {
+    Set<String> tasksInProgress = new HashSet<>();
+    WorkflowContext workflowContext = _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
+    if (workflowContext == null) {
+      return tasksInProgress;

Review comment:
       It'd be good to log a warning message here if the context is null.

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,20 +85,44 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    // No tasks before we start.
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
-
-    // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
-
-    // Should not generate more tasks
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
+
+    // Since we have two tables, two sub-tasks are generated -- one for each table.
+    // The default concurrent sub-tasks per minion instance is 1, and we have one minion
+    // instance spun up. So, one sub-tasks gets scheduled in a minion, and the other one
+    // waits.
+    verifyTaskCount(task1, 0, 1, 1, 2);
+    // Should generate one more task, with two sub-tasks. Both of these sub-tasks will wait
+    // since we have one minion instance that is still running one of the sub-tasks.

Review comment:
       If the only one minion instance is still running one of the sub-tasks, why does the `runningCount` below show 0?

##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,18 +85,34 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
 
+    verifyTaskCount(task1, 0, 1, 1, 2);
     // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
+    String task2 = _taskManager.scheduleTask(TASK_TYPE);
+    assertNotNull(task2);
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task2));
+    verifyTaskCount(task2, 0, 2, 0, 2);

Review comment:
       Is it true that the `totalCount` here is the total number of sub-tasks?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] jackjlli commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
jackjlli commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r659362999



##########
File path: pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/SimpleMinionClusterIntegrationTest.java
##########
@@ -85,18 +85,34 @@ public void setUp()
     startMinion();
   }
 
+  private void verifyTaskCount(String task, int errors, int waiting, int running, int total) {
+    PinotHelixTaskResourceManager.TaskCount taskCount = _helixTaskResourceManager.getTaskCount(task);
+    assertEquals(taskCount.getError(), errors);
+    assertEquals(taskCount.getWaiting(), waiting);
+    assertEquals(taskCount.getRunning(), running);
+    assertEquals(taskCount.getTotal(), total);
+  }
+
   @Test
   public void testStopResumeDeleteTaskQueue() {
     // Hold the task
     HOLD.set(true);
+    assertEquals(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).size(),0);
+    verifyTaskCount("Task_" + TASK_TYPE + "_1624403781879", 0, 0, 0, 0);
 
     // Should create the task queues and generate a task
-    assertNotNull(_taskManager.scheduleTasks().get(TASK_TYPE));
+    String task1 = _taskManager.scheduleTasks().get(TASK_TYPE);
+    assertNotNull(task1);
     assertTrue(_helixTaskResourceManager.getTaskQueues()
         .contains(PinotHelixTaskResourceManager.getHelixJobQueueName(TASK_TYPE)));
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task1));
 
+    verifyTaskCount(task1, 0, 1, 1, 2);
     // Should generate one more task
-    assertNotNull(_taskManager.scheduleTask(TASK_TYPE));
+    String task2 = _taskManager.scheduleTask(TASK_TYPE);
+    assertNotNull(task2);
+    assertTrue(_helixTaskResourceManager.getTasksInProgress(TASK_TYPE).contains(task2));
+    verifyTaskCount(task2, 0, 2, 0, 2);

Review comment:
       Why does the total number of task remain unchanged even after scheduling one more task?
   
   Could we have some comment on why the number of running tasks becomes 0 from 1 after scheduling one more task here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter commented on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869089682


   # [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#7091](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (fb2e3de) into [master](https://codecov.io/gh/apache/incubator-pinot/commit/ac8f098e8ff883abb8323391480039f3a5b8aab7?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (ac8f098) will **decrease** coverage by `31.92%`.
   > The diff coverage is `47.31%`.
   
   [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-pinot/pull/7091/graphs/tree.svg?width=650&height=150&src=pr&token=4ibza2ugkz&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@              Coverage Diff              @@
   ##             master    #7091       +/-   ##
   =============================================
   - Coverage     73.49%   41.56%   -31.93%     
   + Complexity       91        7       -84     
   =============================================
     Files          1492     1493        +1     
     Lines         73422    73511       +89     
     Branches      10574    10587       +13     
   =============================================
   - Hits          53960    30557    -23403     
   - Misses        15927    40365    +24438     
   + Partials       3535     2589      -946     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | integration | `41.56% <47.31%> (+0.05%)` | :arrow_up: |
   | unittests | `?` | |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [...va/org/apache/pinot/controller/ControllerConf.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9Db250cm9sbGVyQ29uZi5qYXZh) | `44.64% <0.00%> (-8.54%)` | :arrow_down: |
   | [...ntroller/helix/core/minion/TaskMetricsEmitter.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9UYXNrTWV0cmljc0VtaXR0ZXIuamF2YQ==) | `0.00% <0.00%> (ø)` | |
   | [...lix/core/minion/PinotHelixTaskResourceManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdEhlbGl4VGFza1Jlc291cmNlTWFuYWdlci5qYXZh) | `79.41% <74.00%> (-3.55%)` | :arrow_down: |
   | [...g/apache/pinot/common/metrics/ControllerGauge.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vbWV0cmljcy9Db250cm9sbGVyR2F1Z2UuamF2YQ==) | `97.22% <100.00%> (+0.55%)` | :arrow_up: |
   | [...controller/helix/core/minion/PinotTaskManager.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29udHJvbGxlci9zcmMvbWFpbi9qYXZhL29yZy9hcGFjaGUvcGlub3QvY29udHJvbGxlci9oZWxpeC9jb3JlL21pbmlvbi9QaW5vdFRhc2tNYW5hZ2VyLmphdmE=) | `27.77% <100.00%> (-0.27%)` | :arrow_down: |
   | [...c/main/java/org/apache/pinot/common/tier/Tier.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...ava/org/apache/pinot/spi/data/MetricFieldSpec.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvZGF0YS9NZXRyaWNGaWVsZFNwZWMuamF2YQ==) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...va/org/apache/pinot/spi/utils/BigDecimalUtils.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvdXRpbHMvQmlnRGVjaW1hbFV0aWxzLmphdmE=) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...java/org/apache/pinot/common/tier/TierFactory.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3QtY29tbW9uL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9jb21tb24vdGllci9UaWVyRmFjdG9yeS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | [...a/org/apache/pinot/spi/config/table/TableType.java](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGlub3Qtc3BpL3NyYy9tYWluL2phdmEvb3JnL2FwYWNoZS9waW5vdC9zcGkvY29uZmlnL3RhYmxlL1RhYmxlVHlwZS5qYXZh) | `0.00% <0.00%> (-100.00%)` | :arrow_down: |
   | ... and [940 more](https://codecov.io/gh/apache/incubator-pinot/pull/7091/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [ac8f098...fb2e3de](https://codecov.io/gh/apache/incubator-pinot/pull/7091?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#discussion_r660872726



##########
File path: pinot-controller/src/main/java/org/apache/pinot/controller/helix/core/minion/PinotHelixTaskResourceManager.java
##########
@@ -249,15 +253,72 @@ public synchronized String submitTask(List<PinotTaskConfig> pinotTaskConfigs, St
    * @return Map from task name to task state
    */
   public synchronized Map<String, TaskState> getTaskStates(String taskType) {
-    Map<String, TaskState> helixJobStates =
-        _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType)).getJobStates();
+    Map<String, TaskState> helixJobStates = new HashMap<>();
+    WorkflowContext workflowContext = _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
+
+    if (workflowContext == null) {
+      return helixJobStates;
+    }
+    helixJobStates = workflowContext.getJobStates();
     Map<String, TaskState> taskStates = new HashMap<>(helixJobStates.size());
     for (Map.Entry<String, TaskState> entry : helixJobStates.entrySet()) {
       taskStates.put(getPinotTaskName(entry.getKey()), entry.getValue());
     }
     return taskStates;
   }
 
+  /**
+   * This method returns a count of sub-tasks in various states, given the top-level task name.
+   * @param parentTaskName (e.g. "Task_TestTask_1624403781879")
+   * @return TaskCount object
+   */
+  public synchronized TaskCount getTaskCount(String parentTaskName) {
+    TaskCount taskCount = new TaskCount();
+    JobContext jobContext = _taskDriver.getJobContext(getHelixJobName(parentTaskName));
+
+    if (jobContext == null) {
+      return taskCount;
+    }
+    Set<Integer> partitionSet = jobContext.getPartitionSet();
+    taskCount.addToTotal(partitionSet.size());
+    for (int partition : partitionSet) {
+      TaskPartitionState state = jobContext.getPartitionState(partition);
+      // Helix returns state as null if the task is not enqueued anywhere yet
+      if (state == null) {
+        // task is not yet assigned to a participant
+        taskCount.addToWaiting(1);
+      } else if (state.equals(TaskPartitionState.INIT) || state.equals(TaskPartitionState.RUNNING)) {
+        taskCount.addToRunning(1);
+      } else if (state.equals(TaskPartitionState.TASK_ERROR)) {
+        taskCount.addToError(1);
+      }
+    }
+    return taskCount;
+  }
+
+  /**
+   * Returns a set of Task names (in the form "Task_TestTask_1624403781879") that are in progress or not started yet.
+   *
+   * @param taskType
+   * @return Set of task names
+   */
+  public synchronized Set<String> getTasksInProgress(String taskType) {
+    Set<String> tasksInProgress = new HashSet<>();
+    WorkflowContext workflowContext = _taskDriver.getWorkflowContext(getHelixJobQueueName(taskType));
+    if (workflowContext == null) {
+      return tasksInProgress;

Review comment:
       The warning message may appear every 5 minutes. Not a good idea. I will leave it as it is.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] codecov-commenter edited a comment on pull request #7091: Added TaskMetricsEmitted periodic controller job

Posted by GitBox <gi...@apache.org>.
codecov-commenter edited a comment on pull request #7091:
URL: https://github.com/apache/incubator-pinot/pull/7091#issuecomment-869089682






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org