You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/10 22:54:44 UTC

[GitHub] [beam] rszper opened a new issue, #21811: [Task]: Update documentation to better explain "sdk_worker_parallelism" and "number_of_worker_harness_threads"

rszper opened a new issue, #21811:
URL: https://github.com/apache/beam/issues/21811

   ### What needs to happen?
   
   Page to update: https://beam.apache.org/documentation/runtime/sdk-harness-config/
   
   Information about what needs to be updated and why:
   
   From what I can tell browsing the code, this was used by the JRH (aka runner v1 python streaming) to control how many Python containers to launch, and also by Flink.
   
   The most useful documentation would be https://beam.apache.org/releases/javadoc/2.37.0/org/apache/beam/sdk/options/PortablePipelineOptions.html#setSdkWorkerParallelism-int- but it seems it doesn't make it into javadoc so here's the info: https://github.com/apache/beam/blob/fdccad20f2af4f4af84b55529acae4b9d0004a01/sdks/java/core/src/main/java/org/apache/beam/sdk/options/PortablePipelineOptions.java#L69
   
   It sets how many SDK processes per worker process.
   
   Of course, it assumes the runner has a concept of "worker" and that things are organized as "processes" which is not always the case and not required by the model.
   
   Number of worker harness threads is a Dataflow-specific option, referring to the number of threads the JRH or pre-portability Java worker use. The most important use here is for the service to instruct the worker on how many threads to spawn based on the machine type chosen, because Java + docker makes it unreliable to introspect the machine to choose this. I do not believe UW / runner v2 uses this flag.
   
   ### Issue Priority
   
   Priority: 3
   
   ### Issue Component
   
   Component: website


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn closed issue #21811: [Task]: Update documentation to better explain "sdk_worker_parallelism" and "number_of_worker_harness_threads"

Posted by GitBox <gi...@apache.org>.
tvalentyn closed issue #21811: [Task]: Update documentation to better explain "sdk_worker_parallelism" and "number_of_worker_harness_threads"
URL: https://github.com/apache/beam/issues/21811


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] rszper commented on issue #21811: [Task]: Update documentation to better explain "sdk_worker_parallelism" and "number_of_worker_harness_threads"

Posted by GitBox <gi...@apache.org>.
rszper commented on issue #21811:
URL: https://github.com/apache/beam/issues/21811#issuecomment-1152792498

   I can't seem to assign this to myself, but if it's possible, please assign to me.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] rszper commented on issue #21811: [Task]: Update documentation to better explain "sdk_worker_parallelism" and "number_of_worker_harness_threads"

Posted by GitBox <gi...@apache.org>.
rszper commented on issue #21811:
URL: https://github.com/apache/beam/issues/21811#issuecomment-1152793654

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org