You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "al97 (via GitHub)" <gi...@apache.org> on 2023/07/14 20:50:59 UTC

[GitHub] [beam] al97 opened a new pull request, #27195: Report total time at max active threads

al97 opened a new pull request, #27195:
URL: https://github.com/apache/beam/pull/27195

   These metrics are needed as a part of an internal project for thread scaling.
   Total time spent at max active threads will be joined with worker cpu utilization to see if we are bounded by the # of parallel work items, ultimately to reduce worker idleness.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1623923809

   Still working on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1600888343

   assign to next reviewer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #27195: Report total time at max active threads

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1599729903

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @jrmccluskey added as fallback since no labels match configuration
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] al97 closed pull request #27195: Report total time at max active threads

Posted by "al97 (via GitHub)" <gi...@apache.org>.
al97 closed pull request #27195: Report total time at max active threads
URL: https://github.com/apache/beam/pull/27195


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on a diff in pull request #27195: Report total time at max active threads

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on code in PR #27195:
URL: https://github.com/apache/beam/pull/27195#discussion_r1264053615


##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##########
@@ -51,7 +55,31 @@ public BoundedQueueExecutor(
             keepAliveTime,
             unit,
             new LinkedBlockingQueue<>(),
-            threadFactory);
+            threadFactory) {
+          @Override
+          protected void beforeExecute(Thread t, Runnable r) {
+            super.beforeExecute(t, r);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize - 1) {

Review Comment:
   nit I'm always afraid of off-by-one errors, but I guess a `>=` would be preferred.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on a diff in pull request #27195: Report total time at max active threads

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on code in PR #27195:
URL: https://github.com/apache/beam/pull/27195#discussion_r1264056587


##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##########
@@ -51,7 +55,31 @@ public BoundedQueueExecutor(
             keepAliveTime,
             unit,
             new LinkedBlockingQueue<>(),
-            threadFactory);
+            threadFactory) {
+          @Override
+          protected void beforeExecute(Thread t, Runnable r) {
+            super.beforeExecute(t, r);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize - 1) {
+                startTimeMaxActiveThreadsUsed = System.currentTimeMillis();
+              }
+              activeCount.incrementAndGet();
+            }
+          }
+
+          @Override
+          protected void afterExecute(Runnable r, Throwable t) {
+            super.afterExecute(r, t);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize) {

Review Comment:
   nit Same idea, getAndDecrement()... I guess you'd just need the synchronized block on the long



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1602729241

   You can run `./gradlew :runners:google-cloud-dataflow-java:worker:spotlessApply` from the top level of the directory to fix the formatting issues


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1638616793

   Once the last workflows pass I will merge. Thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on a diff in pull request #27195: Report total time at max active threads

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on code in PR #27195:
URL: https://github.com/apache/beam/pull/27195#discussion_r1264054141


##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##########
@@ -51,7 +55,31 @@ public BoundedQueueExecutor(
             keepAliveTime,
             unit,
             new LinkedBlockingQueue<>(),
-            threadFactory);
+            threadFactory) {
+          @Override
+          protected void beforeExecute(Thread t, Runnable r) {
+            super.beforeExecute(t, r);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize - 1) {
+                startTimeMaxActiveThreadsUsed = System.currentTimeMillis();
+              }
+              activeCount.incrementAndGet();

Review Comment:
   We could make it "more" atomic
   
   ```suggestion
                 if (activeCount.getAndIncrement() == maximumPoolSize - 1) {
                   startTimeMaxActiveThreadsUsed = System.currentTimeMillis();
                 }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] al97 commented on pull request #27195: Report total time at max active threads

Posted by "al97 (via GitHub)" <gi...@apache.org>.
al97 commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1634645765

   assign to next reviewer


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey merged pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey merged PR #27195:
URL: https://github.com/apache/beam/pull/27195


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1636266194

   @bvolpato do you mind giving this a look? Java is not in my wheelhouse


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] al97 commented on a diff in pull request #27195: Report total time at max active threads

Posted by "al97 (via GitHub)" <gi...@apache.org>.
al97 commented on code in PR #27195:
URL: https://github.com/apache/beam/pull/27195#discussion_r1264096620


##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##########
@@ -51,7 +55,31 @@ public BoundedQueueExecutor(
             keepAliveTime,
             unit,
             new LinkedBlockingQueue<>(),
-            threadFactory);
+            threadFactory) {
+          @Override
+          protected void beforeExecute(Thread t, Runnable r) {
+            super.beforeExecute(t, r);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize - 1) {

Review Comment:
   ack, done.



##########
runners/google-cloud-dataflow-java/worker/src/main/java/org/apache/beam/runners/dataflow/worker/util/BoundedQueueExecutor.java:
##########
@@ -51,7 +55,31 @@ public BoundedQueueExecutor(
             keepAliveTime,
             unit,
             new LinkedBlockingQueue<>(),
-            threadFactory);
+            threadFactory) {
+          @Override
+          protected void beforeExecute(Thread t, Runnable r) {
+            super.beforeExecute(t, r);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize - 1) {
+                startTimeMaxActiveThreadsUsed = System.currentTimeMillis();
+              }
+              activeCount.incrementAndGet();
+            }
+          }
+
+          @Override
+          protected void afterExecute(Runnable r, Throwable t) {
+            super.afterExecute(r, t);
+            synchronized (this) {
+              if (activeCount.get() == maximumPoolSize) {

Review Comment:
   ack, done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] mosche commented on a diff in pull request #27195: Report total time at max active threads

Posted by "mosche (via GitHub)" <gi...@apache.org>.
mosche commented on code in PR #27195:
URL: https://github.com/apache/beam/pull/27195#discussion_r1268003510


##########
runners/google-cloud-dataflow-java/worker/src/test/java/org/apache/beam/runners/dataflow/worker/StreamingDataflowWorkerTest.java:
##########
@@ -2852,6 +2853,74 @@ public void testActiveWorkForShardedKeys() throws Exception {
     Mockito.verifyNoMoreInteractions(mockExecutor);
   }
 
+  @Test
+  public void testMaxThreadMetric() throws Exception {

Review Comment:
   Hi @al97 
   would you mind having another look at this test? It looks flaky on Jenkins (#27555).
   Thanks so much :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] bvolpato commented on pull request #27195: Report total time at max active threads

Posted by "bvolpato (via GitHub)" <gi...@apache.org>.
bvolpato commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1637165048

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #27195: Report total time at max active threads

Posted by "jrmccluskey (via GitHub)" <gi...@apache.org>.
jrmccluskey commented on PR #27195:
URL: https://github.com/apache/beam/pull/27195#issuecomment-1602729463

   Run Java PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org