You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Valentyn Tymofieiev (Jira)" <ji...@apache.org> on 2021/04/08 21:42:00 UTC

[jira] [Commented] (BEAM-7850) Make Environment a top level attribute of PTransform

    [ https://issues.apache.org/jira/browse/BEAM-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317508#comment-17317508 ] 

Valentyn Tymofieiev commented on BEAM-7850:
-------------------------------------------

I think Beam SDK should ensure that by the time a RunnerAPI representation of a pipeline is created,  Environment instances associated with subtransforms include the hints defined for the composite. The merging logic may need to be defined individually per hint. Default implementation can be to use the hint values associated with values defined on the composite. For int hints like min_ram_per_vcpu, one could take a max of the two values. 

Note that we initially plan express hints on transform level ( in Python  - via .with_resource_hints() builder methods). Environment does not appear on this stage, and only appears in the picture when we translate the pipeline into RunnerAPI. In the future we may let users create Environment objects and attach them to transforms when they instantiate the pipeline. Currently this is not possible in Python, and AppliedPTransform.environment_id is not used anywhere as far as I can tell. If/when we allow users to explicitly define PTransforms with Envirionments, we should also ensure that the resource hints on composite transforms are properly propagated to the subtransforms. 

> Make Environment a top level attribute of PTransform
> ----------------------------------------------------
>
>                 Key: BEAM-7850
>                 URL: https://issues.apache.org/jira/browse/BEAM-7850
>             Project: Beam
>          Issue Type: Sub-task
>          Components: beam-model
>            Reporter: Chamikara Madhusanka Jayalath
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P2
>             Fix For: 2.19.0
>
>          Time Spent: 4h
>  Remaining Estimate: 0h
>
> Currently Environment is not a top level attribute of the PTransform (of runner API proto).
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
> Instead it is hidden inside various payload objects. For example, for ParDo, environment will be inside SdkFunctionSpec of ParDoPayload.
> [https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99]
>  
> This makes tracking environment of different types of PTransforms harder and we have to fork code (on the type of PTransform) to extract the Environment where the PTransform should be executed. It will probably be simpler to just make Environment a top level attribute of PTransform.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)