You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 16:33:38 UTC

[GitHub] [beam] damccorm opened a new issue, #20321: Create a mechanism to run a custom worker intialization code on Python workers

damccorm opened a new issue, #20321:
URL: https://github.com/apache/beam/issues/20321

   A couple of Beam users mentioned a usecase where some  initialization code needs to run on Python workers before pipeline processing starts.
   
   Such code needs to be executed run early in the main() method of python worker[1].
   
   Java SDK has provides this capability  via JvmInitializer [2], BEAM-6872. Let's add such capability for Python users as well.
   
   [1] https://github.com/apache/beam/blob/7ad4c4c8e601e39573aae7b4d778be2e908a0868/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L85
   [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/harness/JvmInitializer.java
   
   Imported from Jira [BEAM-10039](https://issues.apache.org/jira/browse/BEAM-10039). Original Jira may contain additional context.
   Reported by: tvalentyn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #20321: Create a mechanism to run a custom worker intialization code on Python workers

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #20321:
URL: https://github.com/apache/beam/issues/20321#issuecomment-1613521830

   also, note that there are may be multiple sdk harness processes running on the same machine, so if initialization code needs to run only once, custom entrypoint may be a better avenue. if it needs to run in each python process, then custom entrypoint won't work


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #20321: Create a mechanism to run a custom worker intialization code on Python workers

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #20321:
URL: https://github.com/apache/beam/issues/20321#issuecomment-1613454845

   To my knowledge we still don't have a dedicated documented mechanism to do this, possible workarounds that might help:
   
   -  manually add required worker initialization steps in the SDK worker https://github.com/apache/beam/blob/7ad4c4c8e601e39573aae7b4d778be2e908a0868/sdks/python/apache_beam/runners/worker/sdk_worker_main.py#L85 , and supply a custom-built Beam SDK from sources, via --sdk_location flag.
   -  beam plugins ( https://github.com/apache/beam/pull/16920 )
   -  supply a custom container with a custom entrypoint https://cloud.google.com/dataflow/docs/guides/using-custom-containers#custom-entrypoint.
    
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org