You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chamikara Jayalath <ch...@google.com> on 2019/11/13 05:24:51 UTC

Make environment_id a top level attribute of PTransform

This was discussed in a JIRA [1] but don't think this was mentioned in the
dev list.

Not having environment_id as a top level attribute of PTransform [2] makes
it difficult to track the Environment [3] a given PTransform should be
executed in. For example, in Dataflow, we have to fork code in several
places to filter out the Environment from a given PTransform proto.

Making environment_id a top level attribute of PTransform and removing it
from various payload types will make tracking environments easier. Also
code will become less error prone since we don't have to fork for all
possible payload types.

Any objections to doing this change ?

Thanks,
Cham

[1] https://issues.apache.org/jira/browse/BEAM-7850
[2]
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99
[3]
https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L1021

Re: Make environment_id a top level attribute of PTransform

Posted by Chamikara Jayalath <ch...@google.com>.
On Wed, Nov 13, 2019 at 10:42 AM Luke Cwik <lc...@google.com> wrote:

> The original ideology was around having only those attributes that
> required to set it would contain the attribute but once something becomes
> common enough it makes sense to have it as an optional parameter so +1.
>
> Are there areas where the environment id will still exist outside of a
> PTransform?
>

Only scenario I can think of is, support for first order functions (UDFs)
in cross-language transforms where a function might have to be executed in
a different environment than the PTransform. But I don't think we should
make the very common case of having both PTransforms and associated
functions in the same environment hard/error-prone due to this. We could
later introduce specifying environment along with associated functions (and
any other properties we need) when we design support for first order
functions in cross-language transforms.

Thanks,
Cham


>
>
> On Tue, Nov 12, 2019 at 9:25 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> This was discussed in a JIRA [1] but don't think this was mentioned in
>> the dev list.
>>
>> Not having environment_id as a top level attribute of PTransform [2]
>> makes it difficult to track the Environment [3] a given PTransform should
>> be executed in. For example, in Dataflow, we have to fork code in several
>> places to filter out the Environment from a given PTransform proto.
>>
>> Making environment_id a top level attribute of PTransform and removing it
>> from various payload types will make tracking environments easier. Also
>> code will become less error prone since we don't have to fork for all
>> possible payload types.
>>
>> Any objections to doing this change ?
>>
>> Thanks,
>> Cham
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-7850
>> [2]
>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99
>> [3]
>> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L1021
>>
>

Re: Make environment_id a top level attribute of PTransform

Posted by Luke Cwik <lc...@google.com>.
The original ideology was around having only those attributes that required
to set it would contain the attribute but once something becomes common
enough it makes sense to have it as an optional parameter so +1.

Are there areas where the environment id will still exist outside of a
PTransform?


On Tue, Nov 12, 2019 at 9:25 PM Chamikara Jayalath <ch...@google.com>
wrote:

> This was discussed in a JIRA [1] but don't think this was mentioned in the
> dev list.
>
> Not having environment_id as a top level attribute of PTransform [2] makes
> it difficult to track the Environment [3] a given PTransform should be
> executed in. For example, in Dataflow, we have to fork code in several
> places to filter out the Environment from a given PTransform proto.
>
> Making environment_id a top level attribute of PTransform and removing it
> from various payload types will make tracking environments easier. Also
> code will become less error prone since we don't have to fork for all
> possible payload types.
>
> Any objections to doing this change ?
>
> Thanks,
> Cham
>
> [1] https://issues.apache.org/jira/browse/BEAM-7850
> [2]
> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L99
> [3]
> https://github.com/apache/beam/blob/master/model/pipeline/src/main/proto/beam_runner_api.proto#L1021
>