You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/07/20 20:52:50 UTC

[GitHub] [beam] yuvipanda commented on issue #22349: [Feature Request]: Host docker images with the `conda` package manager for Beam's Python SDK.

yuvipanda commented on issue #22349:
URL: https://github.com/apache/beam/issues/22349#issuecomment-1190746544

   Thanks a lot for working through this, @alxmrs!
   
   > I actually don't really know what problem Yuvi / the forge container is trying to solve in the first place. 
   
   This is a great question! We (the pangeo-project) maintain a set of docker images pre-built with specific pinned versions of common dependencies in the earth sciences ecosystem - https://github.com/pangeo-data/pangeo-docker-images/. We provided dated tags that people can reference and use wherever they need to run code - in JupyterHubs (for interactive Jupyter use), in dask (for scale-out workflows), etc. The goal of the `forge/` image is to provide a version that is usable in Apache beam contexts. These are fairly heavy images - the conda based environment build step takes at least 10 minutes, and often longer, to run - and so we can't really do these at *runtime*. We also want to make sure the packages are tested to work together, as they often have complex C (or even fortran!) based dependencies. There's also a reproducibility angle here, as specifying the docker image tag a workflow is using provides a better chance of longer term reproducibility than just a list of packages t
 o install.
   
   The goal is for end users to be able to pick a tag and know that it works with the rest of the geosciences stack curated by pangeo. I hope that helps clarify the goal of the forge/ image.
   
   I'm not entirely sure what the original problem with copying the go binary was, as long as we weren't copying the python packages. Possibly something to do with the inherited entrypoint? I'm doing some funky stuff in https://github.com/pangeo-data/pangeo-docker-images/pull/355/files#diff-a77643b43a7be453fa8556937bf32b27907e152a10d4c693f3e7670c66a44378 to have the entrypoint work for both beam as well as for jupyter.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org