You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 15:37:12 UTC

[GitHub] [beam] damccorm opened a new issue, #20163: Update Python SDK to construct Dataflow job requests from Beam runner API protos

damccorm opened a new issue, #20163:
URL: https://github.com/apache/beam/issues/20163

   Currently, portable runners are expected to do following when constructing a runner specific job.
   
   SDK specific job graph -\> Beam runner API proto -\> Runner specific job request
   
   Portable Spark and Flink follow this model.
   
   Dataflow does following.
   
   SDK specific job graph -\> Runner specific job request
   
   Beam runner API proto -\> Upload to GCS -\> Download at workers
   
    
   
   We should update Dataflow to follow the prior path which is expected to be followed by all portable runners.
   
   This will simplify the cross-language transforms job construction logic for Dataflow.
   
   We can probably start this by just implementing this for Python SDK for portions of pipeline received by expanding external transforms.
   
   cc: [~lcwik] [~robertwb]
   
    
   
   Imported from Jira [BEAM-10012](https://issues.apache.org/jira/browse/BEAM-10012). Original Jira may contain additional context.
   Reported by: chamikara.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org