You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 15:37:12 UTC
[GitHub] [beam] damccorm opened a new issue, #20163: Update Python SDK to construct Dataflow job requests from Beam runner API protos
damccorm opened a new issue, #20163:
URL: https://github.com/apache/beam/issues/20163
Currently, portable runners are expected to do following when constructing a runner specific job.
SDK specific job graph -\> Beam runner API proto -\> Runner specific job request
Portable Spark and Flink follow this model.
Dataflow does following.
SDK specific job graph -\> Runner specific job request
Beam runner API proto -\> Upload to GCS -\> Download at workers
We should update Dataflow to follow the prior path which is expected to be followed by all portable runners.
This will simplify the cross-language transforms job construction logic for Dataflow.
We can probably start this by just implementing this for Python SDK for portions of pipeline received by expanding external transforms.
cc: [~lcwik] [~robertwb]
Imported from Jira [BEAM-10012](https://issues.apache.org/jira/browse/BEAM-10012). Original Jira may contain additional context.
Reported by: chamikara.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org