You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 16:33:38 UTC
[GitHub] [beam] damccorm opened a new issue, #20321: Create a mechanism to run a custom worker intialization code on Python workers
damccorm opened a new issue, #20321:
URL: https://github.com/apache/beam/issues/20321
A couple of Beam users mentioned a usecase where some initialization code needs to run on Python workers before pipeline processing starts.
Such code needs to be executed run early in the main() method of python worker[1].
Java SDK has provides this capability via JvmInitializer [2], BEAM-6872. Let's add such capability for Python users as well.
[1] https://github.com/apache/beam/blob/7ad4c4c8e601e39573aae7b4d778be2e908a0868/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L85
[2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java
Imported from Jira [BEAM-10039](https://issues.apache.org/jira/browse/BEAM-10039). Original Jira may contain additional context.
Reported by: tvalentyn.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #20321: Create a mechanism to run a custom worker intialization code on Python workers
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #20321:
URL: https://github.com/apache/beam/issues/20321#issuecomment-1613521830
also, note that there are may be multiple sdk harness processes running on the same machine, so if initialization code needs to run only once, custom entrypoint may be a better avenue. if it needs to run in each python process, then custom entrypoint won't work
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #20321: Create a mechanism to run a custom worker intialization code on Python workers
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #20321:
URL: https://github.com/apache/beam/issues/20321#issuecomment-1613454845
To my knowledge we still don't have a dedicated documented mechanism to do this, possible workarounds that might help:
- manually add required worker initialization steps in the SDK worker https://github.com/apache/beam/blob/7ad4c4c8e601e39573aae7b4d778be2e908a0868/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L85 , and supply a custom-built Beam SDK from sources, via --sdk_location flag.
- beam plugins ( https://github.com/apache/beam/pull/16920 )
- supply a custom container with a custom entrypoint https://cloud.google.com/dataflow/docs/guides/using-custom-containers#custom-entrypoint.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org