You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:02:08 UTC

[GitHub] [beam] damccorm opened a new issue, #21073: Revisit process of dependency staging in Beam Python

damccorm opened a new issue, #21073:
URL: https://github.com/apache/beam/issues/21073

   There are a few issues:
   
   1) Including Beam itself in requirements.txt is causing unnecessary friction, and is suboptimal, because Beam takes care to stage itself to the workers, and Beam workers include Beam dependencies. This is not clear from https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/. Yet from a user's perspective including Beam into requirements.txt seems natural. 
   
   2) Staging sources of all dependencies mentioned in requirements.txt,  and their transitive dependencies, in some cases involves a hidden package recompilation, initiated by pip. The reason is that  pip  cannot reliably identify dependencies of a package without recompiling a package in certain cases, see [1-3] for pointers.  This increases time it takes to launch a Beam job, and may require additional software (such as linux packages with header libraries or gcc deps) to be available. This causes friction, confusion, is not obvious and beyond Beam's control.
   
   [1] https://github.com/pypa/pip/issues/8387
   [2] https://github.com/pypa/pip/issues/7995
   [3] https://discuss.python.org/t/pip-download-just-the-source-packages-no-building-no-metadata-etc/4651
   
   Imported from Jira [BEAM-12555](https://issues.apache.org/jira/browse/BEAM-12555). Original Jira may contain additional context.
   Reported by: tvalentyn.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #21073: Revisit process of dependency staging in Beam Python

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on issue #21073:
URL: https://github.com/apache/beam/issues/21073#issuecomment-1708605172

   we no longer stage beam and no longer stage sources of packages, this has been done a  while back already in a duplicate issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn closed issue #21073: Revisit process of dependency staging in Beam Python

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn closed issue #21073: Revisit process of dependency staging in Beam Python
URL: https://github.com/apache/beam/issues/21073


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org