You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pinot.apache.org by GitBox <gi...@apache.org> on 2021/05/06 20:42:09 UTC

[GitHub] [incubator-pinot] mqliang opened a new pull request #6886: Capture system activities time such as OS paging, GC in totalThreadCuTimeNs

mqliang opened a new pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886


   It's possible that `totalThreadCpuTimeNs < totalWallClockTimeNs` when GC or OS paging happens during query processing. This PR make totalThreadCuTimeNs also capture time spending on GC/OC paging, etc.
   
   Please refer the comments for how the time was calculated. @siddharthteotia 
   
   cc @mcvsubbu 
   
   ## Description
   <!-- Add a description of your PR here.
   A good description should include pointers to an issue or design document, etc.
   -->
   ## Upgrade Notes
   Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)
   * [ ] Yes (Please label as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR fix a zero-downtime upgrade introduced earlier?
   * [ ] Yes (Please label this as **<code>backward-incompat</code>**, and complete the section below on Release Notes)
   
   Does this PR otherwise need attention when creating release notes? Things to consider:
   - New configuration options
   - Deprecation of configurations
   - Signature changes to public methods/interfaces
   - New plugins added or old plugins removed
   * [ ] Yes (Please label this PR as **<code>release-notes</code>** and complete the section on Release Notes)
   ## Release Notes
   <!-- If you have tagged this as either backward-incompat or release-notes,
   you MUST add text here that you would like to see appear in release notes of the
   next release. -->
   
   <!-- If you have a series of commits adding or enabling a feature, then
   add this section only in final commit that marks the feature completed.
   Refer to earlier release notes to see examples of text.
   -->
   ## Documentation
   <!-- If you have introduced a new feature or configuration, please add it to the documentation as well.
   See https://docs.pinot.apache.org/developers/developers-and-contributors/update-document
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r627771754



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
+    /*
+     * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process
+     * a query, this is because system activities time such as OS paging, GC, context switching are not captured by
+     * totalThreadCpuTimeNs. For example, let's divide query processing into 4 phases.
+     * - phase 1: single thread preparing. Time used: T1
+     * - phase 2: N threads processing segments in parallel, each thread use time T2
+     * - phase 3: GC/OS paging. Time used: T3
+     * - phase 4: single thread merging intermediate results blocks. Time used: T4
+     *
+     * Then we have following equations:
+     * - singleThreadCpuTimeNs = T1 + T4
+     * - multipleThreadCpuTimeNs = T2 * N
+     * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3
+     * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N
+     * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+     *
+     * Thus:
+     * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs
+     * = totalThreadCpuTimeNsWithoutSystemActivities + T3
+     * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+     * = totalWallClockTimeNs + T2 * (N - 1)
+     * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1)
+     */
+    if (totalThreadCpuTimeNs < totalWallClockTimeNs) {

Review comment:
       System activity (GC, OS paging , lock wait etc) that consumes  CPU is always there. Based on the verification that we did in production, for the case when totalThreadCpuTime < wallClock, there is more of it. 
   
   However, even if totalThreadCpuTime > wallClock which we saw in many cases, there is still some cpu activity unaccounted for. So, let's do this adjustment in both the cases. Let's not have this if check

##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
+    /*
+     * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process

Review comment:
       Nice job on detailed comments




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang closed pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang closed pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630468400



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,7 +127,7 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get(), _numThreads);

Review comment:
       Let's say there are 8 tasks and 4 threads in the executor service. At any point in time, the executor service will run 4 worker threads/tasks on CPU assuming numCores >= 4
   
   So, let's say following happens
   task 1 - thread 1 -> threadCpuTime - t1
   task 2 - thread 2 -> threadCpuTime - t2
   task 3 - thread 3 -> threadCpuTime - t3
   task 4 - thread 4 -> threadCpuTime - t4
   
   task 5 - thread 1 -> threadCpuTime - t5
   task 6 - thread 2 -> threadCpuTime - t6
   task 7 - thread 3 -> threadCpuTime - t7
   task 8 - thread 4 -> threadCpuTime - t8
   
   Today, we compute the total as sum (t1... t8) to get total threadCpuTime across all query threads.
   So, in the adjustment equation when we do threadCpuTime / N , should N be used as 8 or 4
   
   I think N should be 4 because number of tasks is dependent on the number of operators (segments)
   
   GroupBy operator passes _numThreads as numOperators to BaseCombineOperator




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629764516



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,7 +127,7 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get(), _numThreads);

Review comment:
       This is actually incorrect. 
   
   We basically want the number of query execution threads. The _numThreads argument in BaseCombineOperator is misleading (it always has been). The code goes over it in the loop and submits as many async tasks to the executor as the _numThreads. That doesn't really mean that executor service will use those many threads. 
   
   You might submit 100 tasks to an executor service of fixed pool of 8 threads. The threads will be 8.
   
   The server side query execution executor service is created during server startup and the number of threads are determined then and they remain the same since all queries share the executor service. They are referred to as (PQW) or worker threads. Each query gets a  single PQR (runner thread) and 1 or more worker (PQW) threads. So, we need to use PQW + PQR here to get the actual number of threads executing the query. Let's check the code where executor service is created and find out the number. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630603769



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,15 +128,23 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    /*
+     * _numTasks are number of async tasks submitted to the _executorService, but it does not mean Pinot server
+     * use those number of threads to concurrently process segments. Instead, if _executorService thread pool has
+     * less number of threads than _numTasks, the number of threads that used to concurrently process segments equals
+     * to the pool size.
+     */
+    int numServerThreads = Math.min(_numTasks, ResourceManager.DEFAULT_QUERY_WORKER_THREADS);

Review comment:
       very difficult, BaseCombineOperator does not have access to ResourceManager. Based on discussion offline, I add a TODO here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r627770476



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;

Review comment:
       As discussed offline, let's use actual wall clock time which we already have in QueryScheduler. Will have to move out the adjustment code from here to QueryScheduler etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629762731



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -37,21 +36,51 @@ public InstanceResponseOperator(Operator combinedOperator) {
 
   @Override
   protected InstanceResponseBlock getNextBlock() {
-    ThreadTimer mainThreadTimer = new ThreadTimer();
-    mainThreadTimer.start();
-
+    long startWallClockTimeNs = System.nanoTime();
     IntermediateResultsBlock intermediateResultsBlock = (IntermediateResultsBlock) _operator.nextBlock();
     InstanceResponseBlock instanceResponseBlock = new InstanceResponseBlock(intermediateResultsBlock);
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
 
-    mainThreadTimer.stop();
+    int numServerThreads = intermediateResultsBlock.getNumServerThreads();
     long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+        calTotalThreadCpuTimeNs(totalWallClockTimeNs, multipleThreadCpuTimeNs, numServerThreads);
+
     dataTable.getMetadata().put(MetadataKey.THREAD_CPU_TIME_NS.getName(), String.valueOf(totalThreadCpuTimeNs));
 
     return instanceResponseBlock;
   }
 
+  /*
+   * Calculate totalThreadCpuTimeNs based on totalWallClockTimeNs, multipleThreadCpuTimeNs, and numServerThreads.
+   * System activities time such as OS paging, GC, context switching are not captured by totalThreadCpuTimeNs.
+   * For example, let's divide query processing into 4 phases:
+   * - phase 1: single thread preparing. Time used: T1
+   * - phase 2: N threads processing segments in parallel, each thread use time T2
+   * - phase 3: GC/OS paging. Time used: T3
+   * - phase 4: single thread merging intermediate results blocks. Time used: T4
+   *
+   * Then we have following equations:
+   * - singleThreadCpuTimeNs = T1 + T4
+   * - multipleThreadCpuTimeNs = T2 * N
+   * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3
+   * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N
+   * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+   *
+   * Thus:
+   * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs
+   * = totalThreadCpuTimeNsWithoutSystemActivities + T3
+   * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+   * = totalWallClockTimeNs + T2 * (N - 1)
+   * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1)
+   */
+  public static long calTotalThreadCpuTimeNs(long totalWallClockTimeNs, long multipleThreadCpuTimeNs, int numServerThreads) {
+    return totalWallClockTimeNs + multipleThreadCpuTimeNs * (numServerThreads - 1) / numServerThreads;
+  }

Review comment:
       We should use parentheses here for precedence I think. Also, we should use rounding. It should be :
   
   ```
   double a = multipleThreadCpuTimeNs / (double)numServerThreads ; 
   // we should not get 0 above and that's why casting to double
   double b = a * (numThreads - 1) => 
   long c = Math.round ((double)wallClock  + b)
   return c
   ```
   
   Please make sure unit tests cover all scenarios




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630521173



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -37,21 +36,51 @@ public InstanceResponseOperator(Operator combinedOperator) {
 
   @Override
   protected InstanceResponseBlock getNextBlock() {
-    ThreadTimer mainThreadTimer = new ThreadTimer();
-    mainThreadTimer.start();
-
+    long startWallClockTimeNs = System.nanoTime();
     IntermediateResultsBlock intermediateResultsBlock = (IntermediateResultsBlock) _operator.nextBlock();
     InstanceResponseBlock instanceResponseBlock = new InstanceResponseBlock(intermediateResultsBlock);
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
 
-    mainThreadTimer.stop();
+    int numServerThreads = intermediateResultsBlock.getNumServerThreads();
     long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+        calTotalThreadCpuTimeNs(totalWallClockTimeNs, multipleThreadCpuTimeNs, numServerThreads);
+
     dataTable.getMetadata().put(MetadataKey.THREAD_CPU_TIME_NS.getName(), String.valueOf(totalThreadCpuTimeNs));
 
     return instanceResponseBlock;
   }
 
+  /*
+   * Calculate totalThreadCpuTimeNs based on totalWallClockTimeNs, multipleThreadCpuTimeNs, and numServerThreads.
+   * System activities time such as OS paging, GC, context switching are not captured by totalThreadCpuTimeNs.
+   * For example, let's divide query processing into 4 phases:
+   * - phase 1: single thread preparing. Time used: T1
+   * - phase 2: N threads processing segments in parallel, each thread use time T2
+   * - phase 3: GC/OS paging. Time used: T3
+   * - phase 4: single thread merging intermediate results blocks. Time used: T4
+   *
+   * Then we have following equations:
+   * - singleThreadCpuTimeNs = T1 + T4
+   * - multipleThreadCpuTimeNs = T2 * N
+   * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3
+   * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N
+   * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+   *
+   * Thus:
+   * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs
+   * = totalThreadCpuTimeNsWithoutSystemActivities + T3
+   * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+   * = totalWallClockTimeNs + T2 * (N - 1)
+   * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1)
+   */
+  public static long calTotalThreadCpuTimeNs(long totalWallClockTimeNs, long multipleThreadCpuTimeNs, int numServerThreads) {
+    return totalWallClockTimeNs + multipleThreadCpuTimeNs * (numServerThreads - 1) / numServerThreads;
+  }

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia merged pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia merged pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630471185



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,7 +127,7 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get(), _numThreads);

Review comment:
       So, let's change _numThreads to numTasks or something more meaningful
   
   We have 2 cases
   
   - GroupBy passes _numTasks as numOperators
   - Selection and everyone else passes _numTasks as a factor of number of cores see function  `public static int getNumThreadsForQuery(int numOperators)` in CombineOperatorUtils
   - 
   case 1 - query on 100 segments with 8 threads in worker executor service
   
   - group by - _numTasks = 100, _numOperators = 100, worker threads = 8
   - selection and everyone else - _numTasks = 8, _numOperators = 100, worker threads = 8
   
   case 2 - query on 4 segments with 8 threads in worker executor service
   
   - group by - _numTasks = 4, _numOperators = 4, worker threads = 8
   - selection and everyone else - _numTasks = 4, _numOperators = 4, worker threads = 8
   
   Let's have good tests for all these. You can also run OfflineClusterIntegrationTest in debug mode and see how _numOperators, _numTasks and numWorkerThreads look like




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630471185



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,7 +127,7 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get(), _numThreads);

Review comment:
       So, let's change _numThreads to numTasks or something more meaningful
   
   We have 2 cases
   
   - GroupBy passes _numTasks as numOperators
   - Selection and everyone else passes _numTasks as a factor of number of cores see function  `public static int getNumThreadsForQuery(int numOperators)` in CombineOperatorUtils
   
   case 1 - query on 100 segments with 8 threads in worker executor service
   
   - group by - _numTasks = 100, _numOperators = 100, worker threads = 8
   - selection and everyone else - _numTasks = 8, _numOperators = 100, worker threads = 8
   
   case 2 - query on 4 segments with 8 threads in worker executor service
   
   - group by - _numTasks = 4, _numOperators = 4, worker threads = 8
   - selection and everyone else - _numTasks = 4, _numOperators = 4, worker threads = 8
   
   Let's have good tests for all these. You can also run OfflineClusterIntegrationTest in debug mode and see how _numOperators, _numTasks and numWorkerThreads look like




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630590682



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,15 +128,23 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    /*
+     * _numTasks are number of async tasks submitted to the _executorService, but it does not mean Pinot server
+     * use those number of threads to concurrently process segments. Instead, if _executorService thread pool has
+     * less number of threads than _numTasks, the number of threads that used to concurrently process segments equals
+     * to the pool size.
+     */
+    int numServerThreads = Math.min(_numTasks, ResourceManager.DEFAULT_QUERY_WORKER_THREADS);

Review comment:
       Default should not be used. From the executor service, can we get the number of threads it was configured with ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] richardstartin commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r628477390



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();

Review comment:
       It seems it is, but this would already have happened long before this call https://github.com/apache/incubator-pinot/blob/75bd7b9317058e7a2d1c2b6e3f9ce3069109ba1d/pinot-core/src/main/java/org/apache/pinot/core/query/request/context/ThreadTimer.java#L30




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630519892



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/operator/ThreadCpuTimeMeasurementTest.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.operator;
+
+import org.locationtech.jts.util.Assert;
+import org.testng.annotations.Test;
+
+
+public class ThreadCpuTimeMeasurementTest {
+
+  @Test
+  public void testAdditionTransformFunction() {
+    class testCase {
+      final long totalWallClockTimeNs;
+      final long multipleThreadCpuTimeNs;
+      final int numServerThreads;
+      final long totalThreadCpuTimeNs;
+
+      testCase(long totalWallClockTimeNs, long multipleThreadCpuTimeNs, int numServerThreads,
+          long totalThreadCpuTimeNs) {
+        this.totalWallClockTimeNs = totalWallClockTimeNs;
+        this.multipleThreadCpuTimeNs = multipleThreadCpuTimeNs;
+        this.numServerThreads = numServerThreads;
+        this.totalThreadCpuTimeNs = totalThreadCpuTimeNs;
+      }
+    }
+
+    testCase[] testCases =
+        new testCase[]{
+            new testCase(4245673, 7124487, 3, 8995331),

Review comment:
       Those numbers are collected from the data. We have case where `totalWallClockTimeNs < multipleThreadCpuTimeNs ` and `totalWallClockTimeNs > multipleThreadCpuTimeNs ` (due to GC/OS paging, etc). I also add a special case where `numServerThreads =1`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629775907



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/operator/ThreadCpuTimeMeasurementTest.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.operator;
+
+import org.locationtech.jts.util.Assert;
+import org.testng.annotations.Test;
+
+
+public class ThreadCpuTimeMeasurementTest {
+
+  @Test
+  public void testAdditionTransformFunction() {
+    class testCase {
+      final long totalWallClockTimeNs;
+      final long multipleThreadCpuTimeNs;
+      final int numServerThreads;
+      final long totalThreadCpuTimeNs;
+
+      testCase(long totalWallClockTimeNs, long multipleThreadCpuTimeNs, int numServerThreads,
+          long totalThreadCpuTimeNs) {
+        this.totalWallClockTimeNs = totalWallClockTimeNs;
+        this.multipleThreadCpuTimeNs = multipleThreadCpuTimeNs;
+        this.numServerThreads = numServerThreads;
+        this.totalThreadCpuTimeNs = totalThreadCpuTimeNs;
+      }
+    }
+
+    testCase[] testCases =
+        new testCase[]{
+            new testCase(4245673, 7124487, 3, 8995331),

Review comment:
       Can you please take a look at the production data we collected during verification and then try to make sure everything is covered ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629762731



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -37,21 +36,51 @@ public InstanceResponseOperator(Operator combinedOperator) {
 
   @Override
   protected InstanceResponseBlock getNextBlock() {
-    ThreadTimer mainThreadTimer = new ThreadTimer();
-    mainThreadTimer.start();
-
+    long startWallClockTimeNs = System.nanoTime();
     IntermediateResultsBlock intermediateResultsBlock = (IntermediateResultsBlock) _operator.nextBlock();
     InstanceResponseBlock instanceResponseBlock = new InstanceResponseBlock(intermediateResultsBlock);
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
 
-    mainThreadTimer.stop();
+    int numServerThreads = intermediateResultsBlock.getNumServerThreads();
     long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+        calTotalThreadCpuTimeNs(totalWallClockTimeNs, multipleThreadCpuTimeNs, numServerThreads);
+
     dataTable.getMetadata().put(MetadataKey.THREAD_CPU_TIME_NS.getName(), String.valueOf(totalThreadCpuTimeNs));
 
     return instanceResponseBlock;
   }
 
+  /*
+   * Calculate totalThreadCpuTimeNs based on totalWallClockTimeNs, multipleThreadCpuTimeNs, and numServerThreads.
+   * System activities time such as OS paging, GC, context switching are not captured by totalThreadCpuTimeNs.
+   * For example, let's divide query processing into 4 phases:
+   * - phase 1: single thread preparing. Time used: T1
+   * - phase 2: N threads processing segments in parallel, each thread use time T2
+   * - phase 3: GC/OS paging. Time used: T3
+   * - phase 4: single thread merging intermediate results blocks. Time used: T4
+   *
+   * Then we have following equations:
+   * - singleThreadCpuTimeNs = T1 + T4
+   * - multipleThreadCpuTimeNs = T2 * N
+   * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3
+   * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N
+   * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+   *
+   * Thus:
+   * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs
+   * = totalThreadCpuTimeNsWithoutSystemActivities + T3
+   * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+   * = totalWallClockTimeNs + T2 * (N - 1)
+   * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1)
+   */
+  public static long calTotalThreadCpuTimeNs(long totalWallClockTimeNs, long multipleThreadCpuTimeNs, int numServerThreads) {
+    return totalWallClockTimeNs + multipleThreadCpuTimeNs * (numServerThreads - 1) / numServerThreads;
+  }

Review comment:
       We should use parentheses here for precedence I think. Also, we should use rounding. It should be :
   
   double a = multipleThreadCpuTimeNs / (double)numServerThreads ; 
   // we should not get 0 above and that's why casting to double
   double b = a * (numThreads - 1) => 
   long c = Math.round ((double)wallClock  + b)
   return c
   
   Please make sure unit tests cover all scenarios




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r628637817



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
+    /*
+     * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process
+     * a query, this is because system activities time such as OS paging, GC, context switching are not captured by
+     * totalThreadCpuTimeNs. For example, let's divide query processing into 4 phases.
+     * - phase 1: single thread preparing. Time used: T1
+     * - phase 2: N threads processing segments in parallel, each thread use time T2
+     * - phase 3: GC/OS paging. Time used: T3
+     * - phase 4: single thread merging intermediate results blocks. Time used: T4
+     *
+     * Then we have following equations:
+     * - singleThreadCpuTimeNs = T1 + T4
+     * - multipleThreadCpuTimeNs = T2 * N
+     * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3
+     * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N
+     * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+     *
+     * Thus:
+     * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs
+     * = totalThreadCpuTimeNsWithoutSystemActivities + T3
+     * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+     * = totalWallClockTimeNs + T2 * (N - 1)
+     * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1)
+     */
+    if (totalThreadCpuTimeNs < totalWallClockTimeNs) {

Review comment:
       The only (unlikely) case that we should be careful about is if total turns out to be negative for some reason. That means the wall clock was probably moving a lot slower than the measured thread times. This indicates some fault as far as I can see. I don't think it can happen in a properly functioning system.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630603301



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/operator/ThreadCpuTimeMeasurementTest.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.operator;
+
+import org.locationtech.jts.util.Assert;
+import org.testng.annotations.Test;
+
+
+public class ThreadCpuTimeMeasurementTest {
+
+  @Test
+  public void testAdditionTransformFunction() {

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629764516



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,7 +127,7 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get(), _numThreads);

Review comment:
       Thinking more about this, it is actually incorrect. 
   
   We basically want the number of query execution threads. The `_numThreads` argument in BaseCombineOperator is misleading (it always has been). The code goes over it in the loop and submits as many async tasks to the executor as the `_numThreads`. That doesn't really mean that executor service will use those many threads. 
   
   You might submit 100 tasks (futures) to an executor service of fixed pool of 8 threads. The threads will be 8. The number of tasks or futures as we call in the code is 100
   
   The server side query execution executor service is created during server startup and the number of threads are determined then and they remain the same since all queries share the executor service. They are referred to as (PQW) or worker threads. Each query gets a  single PQR (runner thread) and 1 or more worker (PQW) threads. So, we need to use PQW + PQR here to get the actual number of threads executing the query. Let's check the code where executor service is created and find out the number. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629773754



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;

Review comment:
       @mqliang , let's please also add a note saying that --  if/when the threadCpuTime based instrumentation is done for other parts of execution (planning, pruning etc), we will have to change the wallClockTime computation accordingly. Right now everything under InstanceResponseOperator is the one that is instrumented with threadCpuTime. So, the wallClockTime is also measured here for adjustment. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] richardstartin commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
richardstartin commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r628473211



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();

Review comment:
       Just a note that if this is backed by `ThreadMXBean.getCurrentThreadCpuTime` and the JVM is not running on bare metal, this call can take up to half a millisecond.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r628638182



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;

Review comment:
       @mqliang, based on our latest discussion, let's use the wall clock time from the time query started execution (the query main thread was put on CPU). To be precise, this happens at line 118 in processQuery() method in [ServerQueryExecutorV1Impl](https://github.com/apache/incubator-pinot/blob/master/pinot-core/src/main/java/org/apache/pinot/core/query/executor/ServerQueryExecutorV1Impl.java#L118). 
   
   If we take the absolute wall clock time (the one measured in QueryScheduler right from the time query was arrived on the server), then we will also be accounting for queue wait that happened due to QPS etc which we don't want part of query cpu time. Confirmed with @mcvsubbu and he is also aligned with this. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mcvsubbu commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mcvsubbu commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r628636851



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();

Review comment:
       @richardstartin  we have a config to turn the thread cpu measurements off. If there is a way of auto-detecting this case and turning it off after the server is started, we can add the code. Otherwise, we can turn it off before starting the server.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r627772457



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
+    /*
+     * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process
+     * a query, this is because system activities time such as OS paging, GC, context switching are not captured by
+     * totalThreadCpuTimeNs. For example, let's divide query processing into 4 phases.
+     * - phase 1: single thread preparing. Time used: T1
+     * - phase 2: N threads processing segments in parallel, each thread use time T2
+     * - phase 3: GC/OS paging. Time used: T3

Review comment:
       Let's add unit tests for this adjustment code. We already have examples we collected from production verification. You can use them




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630518278



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;

Review comment:
       done




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] mqliang commented on pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
mqliang commented on pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#issuecomment-837288857


    integration-test failed due to flaky issue, close&re-open PR to trigger a re-run.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r627771754



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/InstanceResponseOperator.java
##########
@@ -45,8 +46,39 @@ protected InstanceResponseBlock getNextBlock() {
     DataTable dataTable = instanceResponseBlock.getInstanceResponseDataTable();
 
     mainThreadTimer.stop();
-    long totalThreadCpuTimeNs =
-        intermediateResultsBlock.getExecutionThreadCpuTimeNs() + mainThreadTimer.getThreadTimeNs();
+    long endWallClockTimeNs = System.nanoTime();
+
+    long singleThreadCpuTimeNs = mainThreadTimer.getThreadTimeNs();
+    long multipleThreadCpuTimeNs = intermediateResultsBlock.getExecutionThreadCpuTimeNs();
+    long totalThreadCpuTimeNs = singleThreadCpuTimeNs + multipleThreadCpuTimeNs;
+    long totalWallClockTimeNs = endWallClockTimeNs - startWallClockTimeNs;
+    /*
+     * It's possible that totalThreadCpuTimeNs < totalWallClockTimeNs even if server launch multiple threads to process
+     * a query, this is because system activities time such as OS paging, GC, context switching are not captured by
+     * totalThreadCpuTimeNs. For example, let's divide query processing into 4 phases.
+     * - phase 1: single thread preparing. Time used: T1
+     * - phase 2: N threads processing segments in parallel, each thread use time T2
+     * - phase 3: GC/OS paging. Time used: T3
+     * - phase 4: single thread merging intermediate results blocks. Time used: T4
+     *
+     * Then we have following equations:
+     * - singleThreadCpuTimeNs = T1 + T4
+     * - multipleThreadCpuTimeNs = T2 * N
+     * - totalWallClockTimeNs = T1 + T2 + T3 + T4 = singleThreadCpuTimeNs + T2 + T3
+     * - totalThreadCpuTimeNsWithoutSystemActivities = T1 + T2 * N + T4 = singleThreadCpuTimeNs + T2 * N
+     * - systemActivitiesTimeNs = T3 = (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+     *
+     * Thus:
+     * totalThreadCpuTimeNsWithSystemActivities = totalThreadCpuTimeNsWithoutSystemActivities + systemActivitiesTimeNs
+     * = totalThreadCpuTimeNsWithoutSystemActivities + T3
+     * = totalThreadCpuTimeNsWithoutSystemActivities + (totalWallClockTimeNs - totalThreadCpuTimeNsWithoutSystemActivities) + T2 * (N - 1)
+     * = totalWallClockTimeNs + T2 * (N - 1)
+     * = totalWallClockTimeNs + (multipleThreadCpuTimeNs / N) * (N - 1)
+     */
+    if (totalThreadCpuTimeNs < totalWallClockTimeNs) {

Review comment:
       System activity (GC, OS paging , lock wait etc) that consumes  CPU is always there. Based on the verification that we did in production, for the case when totalThreadCpuTime << wallClock, there is more of it. 
   
   However, even if totalThreadCpuTime > wallClock which we saw in many cases, there is still some cpu activity unaccounted for and we should try to include it.
   
   So, let's do this adjustment in both the cases. Let's not have this if check




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r630591154



##########
File path: pinot-core/src/test/java/org/apache/pinot/core/operator/ThreadCpuTimeMeasurementTest.java
##########
@@ -0,0 +1,65 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.pinot.core.operator;
+
+import org.locationtech.jts.util.Assert;
+import org.testng.annotations.Test;
+
+
+public class ThreadCpuTimeMeasurementTest {
+
+  @Test
+  public void testAdditionTransformFunction() {

Review comment:
       This name is not correct




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org


[GitHub] [incubator-pinot] siddharthteotia commented on a change in pull request #6886: Adjust threadCpuTime calculation

Posted by GitBox <gi...@apache.org>.
siddharthteotia commented on a change in pull request #6886:
URL: https://github.com/apache/incubator-pinot/pull/6886#discussion_r629764516



##########
File path: pinot-core/src/main/java/org/apache/pinot/core/operator/combine/BaseCombineOperator.java
##########
@@ -127,7 +127,7 @@ public void runJob() {
       // Deregister the main thread and wait for all threads done
       phaser.awaitAdvance(phaser.arriveAndDeregister());
     }
-    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get());
+    CombineOperatorUtils.setExecutionStatistics(mergedBlock, _operators, totalWorkerThreadCpuTimeNs.get(), _numThreads);

Review comment:
       This is actually incorrect. 
   
   We basically want the number of query execution threads. The `_numThreads` argument in BaseCombineOperator is misleading (it always has been). The code goes over it in the loop and submits as many async tasks to the executor as the `_numThreads`. That doesn't really mean that executor service will use those many threads. 
   
   You might submit 100 tasks (futures) to an executor service of fixed pool of 8 threads. The threads will be 8. The number of tasks or futures as we call in the code is 100
   
   The server side query execution executor service is created during server startup and the number of threads are determined then and they remain the same since all queries share the executor service. They are referred to as (PQW) or worker threads. Each query gets a  single PQR (runner thread) and 1 or more worker (PQW) threads. So, we need to use PQW + PQR here to get the actual number of threads executing the query. Let's check the code where executor service is created and find out the number. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@pinot.apache.org
For additional commands, e-mail: commits-help@pinot.apache.org