You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:40:09 UTC

[GitHub] [beam] damccorm opened a new issue, #21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment.

damccorm opened a new issue, #21123:
URL: https://github.com/apache/beam/issues/21123

   I'm running TFX pipelines on a Flink cluster using Beam in k8s. However, extra python packages passed to the Flink runner (or rather beam worker side-car) are only installed once per deployment cycle. Example:
    - Flink is deployed and is up and running
    - A TFX pipeline starts, submits a job to Flink along with a python whl of custom code and beam ops.
    - The beam worker installs the package and the pipeline finishes succesfully.
    - A new TFX pipeline is build where a new beam fn is introduced, the pipline is started and the new whl is submitted as in step 2).
    - This time, the new package is not being installed in the beam worker causing the job to fail due to a reference which does not exist in the beam worker, since it didn't install the new package.
   
    
   
   I started using Flink from beam version 2.27 and it has been an issue all the time.
   
   Imported from Jira [BEAM-12792](https://issues.apache.org/jira/browse/BEAM-12792). Original Jira may contain additional context.
   Reported by: ConverJens.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn closed issue #21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment.

Posted by GitBox <gi...@apache.org>.
tvalentyn closed issue #21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment.
URL: https://github.com/apache/beam/issues/21123


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment.

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21123:
URL: https://github.com/apache/beam/issues/21123#issuecomment-1246021253

   @damccorm @tvalentyn as experts in our ML integrations and Python do you know anything about how these installs work and why it would not occur twice? I guess it has to do with how the portable Flink runner deploys Python SDK harness containers? But wouldn't a fresh container start up for a new pipeline, hence be a clean start?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #21123: Multiple jobs running on Flink session cluster reuse the persistent Python environment.

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21123:
URL: https://github.com/apache/beam/issues/21123#issuecomment-1248829026

   We actually have an open PR on this: https://github.com/apache/beam/pull/16658
   There was a seemingly working solution, but it had a very strange behavior on GCE VMs which we didn't rootcause, I'll take another look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org