You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 21:10:18 UTC

[GitHub] [beam] kennknowles opened a new issue, #18867: Artifact stager should validate the manifest at pipeline submission time.

kennknowles opened a new issue, #18867:
URL: https://github.com/apache/beam/issues/18867

   One step of executing a Beam pipeline is staging pipeline dependencies to the runner, for Python SDK example see: [https://github.com/apache/beam/blob/1e2218aebf4902ca2f9107c885ff7ef0e1ef6eb8/sdks/python/apache_beam/runners/portability/stager.py#L83](https://github.com/apache/beam/blob/1e2218aebf4902ca2f9107c885ff7ef0e1ef6eb8/sdks/python/apache_beam/runners/portability/stager.py#L83) and it's implementations. 
   
   As a part of this process, we create a manifest of all staged artifacts. We should identify the requirements that constitute correctness of the manifest (for example, artifacts do not repeat twice, see also: https://github.com/apache/beam/blob/b0b7e3bf4941f874d360923ffd1d03e38befc589/sdks/go/pkg/beam/artifact/gcsproxy/retrieval.go#L122), and verify these requirements on the SDK side at pipeline submission, to fail faster.
   
    
   
   Imported from Jira [BEAM-4407](https://issues.apache.org/jira/browse/BEAM-4407). Original Jira may contain additional context.
   Reported by: tvalentyn.
   This issue has child subcomponents which were not migrated over. See the original Jira for more information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org