You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Ke Wu <ke...@gmail.com> on 2021/09/21 20:44:58 UTC

Multi Environment Support

Hello All,

We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:

1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?


Best,
Ke 


[1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344 <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344> 

Re: Multi Environment Support

Posted by Robert Burke <ro...@frantil.com>.
I agree! But the Go plugin model might permit fusion between different
plugins avoiding unnecessary serialization between multiple Go xlang
DoFns.... While still avoiding the Diamond Dep problem, since the plugins
are binaries themselves.

Not sure for certain as I haven't played around with it yet. (And likely
wont for a long while). There's a lot of beam side opportunity there i
think.

On Thu, Sep 30, 2021, 9:51 AM Luke Cwik <lc...@google.com> wrote:

> Self-contained binaries sound great and would fit the XLang model well.
> This would allow people to build pipelines using different modules at
> different versions making "DoFns" or sub-graphs of the pipeline with
> different binaries allowing for them to possibly side step compilation
> issues when users eventually hit diamond dependency issues.
>
> On Thu, Sep 30, 2021 at 9:35 AM Robert Burke <ro...@frantil.com> wrote:
>
>> Go SDK Note: At present there's no implementation for using Go SDK
>> transforms in Python or Java, and due to Go's Static Compilation approach,
>> it's not set in stone for how that will happen.
>>
>> Eg. Tiny self contained binaries? A plug-in framework? Neither would run
>> into the same runtime dependency collision java and python have on occasion
>> (they'd be resolved at compile time anyway...)
>>
>> On Thu, Sep 30, 2021, 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>>
>>> Ideally, we do not want to expose anything directly to users and we, as
>>> the framework and platform provider, separate things out under the hood.
>>>
>>> I would expect users to author their DoFn(s) in the same way as they do
>>> right now, but we expect to change the DoFn(s) that we provide, will be
>>> annotated/marked so that it can be recognized during runtime.
>>>
>>> In our use case, application is executed in Kubernetes environment
>>> therefore, we are expecting to directly use different docker image to
>>> isolate dependencies.
>>>
>>> e.g. we have docker image A, which is beam core, that is used to start
>>> job server and runner process. We have a docker image B, which contains
>>> DoFn(s) that platform provides to serve as a external worker pool service
>>> to execute platform provided DoFn(s), last but not least, users would have
>>> their own docker image represent their application, which will be used to
>>> start the external worker pool service to handle their own UDF execution.
>>>
>>> Does this make sense ?
>>>
>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>>>
>>> That sounds neat. I think that before you try to figure out how to
>>> change Beam to fit this usecase is to think about what would be the best
>>> way for users to specify these requirements when they are constructing the
>>> pipeline. Once you have some samples that you could share the community
>>> would probably be able to give you more pointed advice.
>>> For example will they be running one application with a complicated
>>> class loader setup, if so then we could probably do away with multiple
>>> environments and try to have DoFn's recognize their specific class loader
>>> configuration and replicate it on the SDK harness side.
>>>
>>> Also, for performance reasons users may want to resolve their dependency
>>> issues to create a maximally fused graph to limit performance impact due to
>>> the encoding/decoding boundaries at the edges of those fused graphs.
>>>
>>> Finally, this could definitely apply to languages like Python and Go
>>> (now that Go has support for modules) as dependency issues are a
>>> common problem.
>>>
>>>
>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>>
>>>> Thanks for the advice.
>>>>
>>>> Here are some more background:
>>>>
>>>> We are building a feature called “split deployment” such that, we can
>>>> isolate framework/platform core from user code/dependencies to address
>>>> couple of operational challenges such as dependency conflict,
>>>> alert/exception triaging.
>>>>
>>>> With Beam’s portability framework, runner and sdk worker process
>>>> naturally decouples beam core and user UDFs(DoFn), which is awesome! On top
>>>> of this, we could further distinguish DoFn(s) that end user authors from
>>>> DoFn(s) that platform provides, therefore, we would like these DoFn(s) to
>>>> be executed in different environments, even in the same language, e.g. Java.
>>>>
>>>> Therefore, I am exploring approaches and recommendations what are the
>>>> proper way to do that.
>>>>
>>>> Let me know your thoughts, any feedback/advice is welcome.
>>>>
>>>> Best,
>>>> Ke
>>>>
>>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>>>
>>>> Resource hints have a limited use case and might fit your need.
>>>> You could also try to use the expansion service XLang route to bring in
>>>> a different Java environment.
>>>> Finally, you could modify the pipeline proto that is generated directly
>>>> to choose which environment is used for which PTransform.
>>>>
>>>> Can you provide additional details as to why you would want to have two
>>>> separate java environments (e.g. incompatible versions of libraries)?
>>>>
>>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>>>
>>>>> Thanks Luke for the reply, do you know what is the preferred way to
>>>>> configure a PTransform to be executed in a different environment from
>>>>> another PTransform when both are in the same SDK, e.g. Java ?
>>>>>
>>>>> Best,
>>>>> Ke
>>>>>
>>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>>>
>>>>> Environments that aren't exactly the same are already in separate
>>>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>>>
>>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may
>>>>> also be effectively dead-code.
>>>>>
>>>>> 1:
>>>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>>>
>>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>>>>
>>>>>> Hello All,
>>>>>>
>>>>>> We have a use case where in a java portable pipeline, we would like
>>>>>> to have multiple environments setup in order that some executable stage
>>>>>> runs in one environment while some other executable stages runs in another
>>>>>> environment. Couple of questions on this:
>>>>>>
>>>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests
>>>>>> it is feature pending support
>>>>>> 2. If we did support it, what would the ideal mechanism to
>>>>>> distinguish ParDo/ExecutableStage to be executed in different environment,
>>>>>> is it through ResourceHints?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Ke
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>

Re: Multi Environment Support

Posted by Luke Cwik <lc...@google.com>.
Self-contained binaries sound great and would fit the XLang model well.
This would allow people to build pipelines using different modules at
different versions making "DoFns" or sub-graphs of the pipeline with
different binaries allowing for them to possibly side step compilation
issues when users eventually hit diamond dependency issues.

On Thu, Sep 30, 2021 at 9:35 AM Robert Burke <ro...@frantil.com> wrote:

> Go SDK Note: At present there's no implementation for using Go SDK
> transforms in Python or Java, and due to Go's Static Compilation approach,
> it's not set in stone for how that will happen.
>
> Eg. Tiny self contained binaries? A plug-in framework? Neither would run
> into the same runtime dependency collision java and python have on occasion
> (they'd be resolved at compile time anyway...)
>
> On Thu, Sep 30, 2021, 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>
>> Ideally, we do not want to expose anything directly to users and we, as
>> the framework and platform provider, separate things out under the hood.
>>
>> I would expect users to author their DoFn(s) in the same way as they do
>> right now, but we expect to change the DoFn(s) that we provide, will be
>> annotated/marked so that it can be recognized during runtime.
>>
>> In our use case, application is executed in Kubernetes environment
>> therefore, we are expecting to directly use different docker image to
>> isolate dependencies.
>>
>> e.g. we have docker image A, which is beam core, that is used to start
>> job server and runner process. We have a docker image B, which contains
>> DoFn(s) that platform provides to serve as a external worker pool service
>> to execute platform provided DoFn(s), last but not least, users would have
>> their own docker image represent their application, which will be used to
>> start the external worker pool service to handle their own UDF execution.
>>
>> Does this make sense ?
>>
>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>>
>> That sounds neat. I think that before you try to figure out how to change
>> Beam to fit this usecase is to think about what would be the best way for
>> users to specify these requirements when they are constructing the
>> pipeline. Once you have some samples that you could share the community
>> would probably be able to give you more pointed advice.
>> For example will they be running one application with a complicated class
>> loader setup, if so then we could probably do away with multiple
>> environments and try to have DoFn's recognize their specific class loader
>> configuration and replicate it on the SDK harness side.
>>
>> Also, for performance reasons users may want to resolve their dependency
>> issues to create a maximally fused graph to limit performance impact due to
>> the encoding/decoding boundaries at the edges of those fused graphs.
>>
>> Finally, this could definitely apply to languages like Python and Go (now
>> that Go has support for modules) as dependency issues are a common problem.
>>
>>
>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>
>>> Thanks for the advice.
>>>
>>> Here are some more background:
>>>
>>> We are building a feature called “split deployment” such that, we can
>>> isolate framework/platform core from user code/dependencies to address
>>> couple of operational challenges such as dependency conflict,
>>> alert/exception triaging.
>>>
>>> With Beam’s portability framework, runner and sdk worker process
>>> naturally decouples beam core and user UDFs(DoFn), which is awesome! On top
>>> of this, we could further distinguish DoFn(s) that end user authors from
>>> DoFn(s) that platform provides, therefore, we would like these DoFn(s) to
>>> be executed in different environments, even in the same language, e.g. Java.
>>>
>>> Therefore, I am exploring approaches and recommendations what are the
>>> proper way to do that.
>>>
>>> Let me know your thoughts, any feedback/advice is welcome.
>>>
>>> Best,
>>> Ke
>>>
>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>>
>>> Resource hints have a limited use case and might fit your need.
>>> You could also try to use the expansion service XLang route to bring in
>>> a different Java environment.
>>> Finally, you could modify the pipeline proto that is generated directly
>>> to choose which environment is used for which PTransform.
>>>
>>> Can you provide additional details as to why you would want to have two
>>> separate java environments (e.g. incompatible versions of libraries)?
>>>
>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>>
>>>> Thanks Luke for the reply, do you know what is the preferred way to
>>>> configure a PTransform to be executed in a different environment from
>>>> another PTransform when both are in the same SDK, e.g. Java ?
>>>>
>>>> Best,
>>>> Ke
>>>>
>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>>
>>>> Environments that aren't exactly the same are already in separate
>>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>>
>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may
>>>> also be effectively dead-code.
>>>>
>>>> 1:
>>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>>
>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>>>
>>>>> Hello All,
>>>>>
>>>>> We have a use case where in a java portable pipeline, we would like to
>>>>> have multiple environments setup in order that some executable stage runs
>>>>> in one environment while some other executable stages runs in another
>>>>> environment. Couple of questions on this:
>>>>>
>>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests
>>>>> it is feature pending support
>>>>> 2. If we did support it, what would the ideal mechanism to distinguish
>>>>> ParDo/ExecutableStage to be executed in different environment, is it
>>>>> through ResourceHints?
>>>>>
>>>>>
>>>>> Best,
>>>>> Ke
>>>>>
>>>>>
>>>>> [1]
>>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>>>
>>>>>
>>>>
>>>>
>>>
>>

Re: Multi Environment Support

Posted by Robert Burke <ro...@frantil.com>.
Go SDK Note: At present there's no implementation for using Go SDK
transforms in Python or Java, and due to Go's Static Compilation approach,
it's not set in stone for how that will happen.

Eg. Tiny self contained binaries? A plug-in framework? Neither would run
into the same runtime dependency collision java and python have on occasion
(they'd be resolved at compile time anyway...)

On Thu, Sep 30, 2021, 9:25 AM Ke Wu <ke...@gmail.com> wrote:

> Ideally, we do not want to expose anything directly to users and we, as
> the framework and platform provider, separate things out under the hood.
>
> I would expect users to author their DoFn(s) in the same way as they do
> right now, but we expect to change the DoFn(s) that we provide, will be
> annotated/marked so that it can be recognized during runtime.
>
> In our use case, application is executed in Kubernetes environment
> therefore, we are expecting to directly use different docker image to
> isolate dependencies.
>
> e.g. we have docker image A, which is beam core, that is used to start job
> server and runner process. We have a docker image B, which contains DoFn(s)
> that platform provides to serve as a external worker pool service to
> execute platform provided DoFn(s), last but not least, users would have
> their own docker image represent their application, which will be used to
> start the external worker pool service to handle their own UDF execution.
>
> Does this make sense ?
>
> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>
> That sounds neat. I think that before you try to figure out how to change
> Beam to fit this usecase is to think about what would be the best way for
> users to specify these requirements when they are constructing the
> pipeline. Once you have some samples that you could share the community
> would probably be able to give you more pointed advice.
> For example will they be running one application with a complicated class
> loader setup, if so then we could probably do away with multiple
> environments and try to have DoFn's recognize their specific class loader
> configuration and replicate it on the SDK harness side.
>
> Also, for performance reasons users may want to resolve their dependency
> issues to create a maximally fused graph to limit performance impact due to
> the encoding/decoding boundaries at the edges of those fused graphs.
>
> Finally, this could definitely apply to languages like Python and Go (now
> that Go has support for modules) as dependency issues are a common problem.
>
>
> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>
>> Thanks for the advice.
>>
>> Here are some more background:
>>
>> We are building a feature called “split deployment” such that, we can
>> isolate framework/platform core from user code/dependencies to address
>> couple of operational challenges such as dependency conflict,
>> alert/exception triaging.
>>
>> With Beam’s portability framework, runner and sdk worker process
>> naturally decouples beam core and user UDFs(DoFn), which is awesome! On top
>> of this, we could further distinguish DoFn(s) that end user authors from
>> DoFn(s) that platform provides, therefore, we would like these DoFn(s) to
>> be executed in different environments, even in the same language, e.g. Java.
>>
>> Therefore, I am exploring approaches and recommendations what are the
>> proper way to do that.
>>
>> Let me know your thoughts, any feedback/advice is welcome.
>>
>> Best,
>> Ke
>>
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a
>> different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly
>> to choose which environment is used for which PTransform.
>>
>> Can you provide additional details as to why you would want to have two
>> separate java environments (e.g. incompatible versions of libraries)?
>>
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>> Thanks Luke for the reply, do you know what is the preferred way to
>>> configure a PTransform to be executed in a different environment from
>>> another PTransform when both are in the same SDK, e.g. Java ?
>>>
>>> Best,
>>> Ke
>>>
>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>
>>> Environments that aren't exactly the same are already in separate
>>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>
>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may
>>> also be effectively dead-code.
>>>
>>> 1:
>>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>
>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>>
>>>> Hello All,
>>>>
>>>> We have a use case where in a java portable pipeline, we would like to
>>>> have multiple environments setup in order that some executable stage runs
>>>> in one environment while some other executable stages runs in another
>>>> environment. Couple of questions on this:
>>>>
>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it
>>>> is feature pending support
>>>> 2. If we did support it, what would the ideal mechanism to distinguish
>>>> ParDo/ExecutableStage to be executed in different environment, is it
>>>> through ResourceHints?
>>>>
>>>>
>>>> Best,
>>>> Ke
>>>>
>>>>
>>>> [1]
>>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>>
>>>>
>>>
>>>
>>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Thanks for the pointer! I missed the part that pipeline’s options is copied from expansion options! 

Then, I think I just need a way to disable the auto creation of expansion service during job server start up.


> On Dec 15, 2021, at 11:29 AM, Chamikara Jayalath <ch...@google.com> wrote:
> 
> 
> 
> On Wed, Dec 15, 2021 at 10:38 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
> Thanks for Cham to chime in, having each expansion service instance to be able to serve a distinct environment which is configurable is what I am looking for, however, I don’t think it is practical atm.
> 
> The pipeline option it uses to register environment is NOT the pipeline option of expansion service but the option of the pipeline that is created with default, i.e. not configurable.
> 
> 
> AFAICT we get the options object to register the environment from the Pipeline  here: https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L527 <https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L527>
> 
> We set a copy of the input options passed to the expansion service in the pipeline object here: https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L583 <https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L583>
> 
> So probably you just need to make sure the options you need for configuring the environment are copied over when creating "effectiveOpts". Which specific options are you trying to use (assuming Java expansion service) ?
> 
> Thanks,
> Cham
> 
> 
>> On Dec 14, 2021, at 8:49 PM, Chamikara Jayalath <chamikara@google.com <ma...@google.com>> wrote:
>> 
>> 
>> 
>> On Mon, Dec 13, 2021 at 11:12 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>> Agreed, that is why I was hesitant to complicate ServerConfiguration such that it can configure the expansion service it brings up.
>> 
>> In this case, do you think we may just add one more option to disable the expansion service setup so for complicated use cases, expansion service can be brought up manually with desired options?
>> 
>> Another question is, if we update java Expansion service to use its own option instead of blindly uses pipeline’s option [1], will this cause any backward compatibility concern?
>> 
>> I haven't fully followed the thread but this indeed will be a backwards incompatible change. I believe currently we assume each expansion service instance to serve a distinct environment. Also we artificially limit the set of PipelineOptions that apply performing the expansion [1] since arbitrary PipelineOptions specified through the ExpansionService do not get propagated to the SDK that submits the pipeline to the runner. But we can expand this list as needed so that the environment of each expansion service is fully configurable through PipelineOptions
>> 
>> Thanks,
>> Cham
>> 
>> [1] https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572 <https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572>
>> 
>> 
>> Best,
>> Ke
>> 
>> [1] https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528 <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528>
>> 
>> pipeline.getOptions() -> this.pipelineOptions
>> 
>>> On Dec 13, 2021, at 10:40 AM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>> 
>>> I think bringing up the expansion service as part of the job server is
>>> just a convenience for the simple case.
>>> 
>>> On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Awesome, I am happy to make the update.
>>>> 
>>>> One more question, I noticed that expansion service is currently being brought up as part of job server in Java [1], if this is the preferred approach, does it mean, we should update ServerConfiguration to include such configurations, like expansion server port [2]
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> [1] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58 <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58>
>>>> [2] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88 <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88>
>>>> 
>>>> 
>>>> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>>> 
>>>> An expansion service should be able to specify its own environment, as
>>>> is done in Python, not blindly inherit it from the caller. Java should
>>>> be updated to have the same flexibility.
>>>> 
>>>> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> Hi Robert,
>>>> 
>>>> After some more digging and exploration, I realized that it is not clean and straightforward to update PipelineOption to support such.
>>>> 
>>>> Therefore, I spent more time exploring the external transform & expansion service route. However, I noticed that this approach has the same limitation where we cannot configure environments ourselves.
>>>> 
>>>> This is because when ExpansionService register environment [1], it uses the pipeline options of the original pipeline. This limitation not only restrict our capability to configure a different external environment in external mode but also restrict Docker mode to specify docker image for different SDKs but use defaults.
>>>> 
>>>> Is this expected or do you think ExpansionService should be updated to be configurable of the environment it registers ?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> 
>>>> [1]
>>>> 
>>>> Java: https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526 <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526>
>>>> 
>>>> Python:
>>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39 <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39>
>>>> 
>>>> 
>>>> 
>>>> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>>> 
>>>> Could you give a concrete example of what such a specification would look like?
>>>> 
>>>> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> Hi Robert,
>>>> 
>>>> Thanks for the pointer, using expansion service hack seems to work!
>>>> 
>>>> On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>>> 
>>>> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> I have a quick follow up questions.
>>>> 
>>>> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
>>>> 
>>>> 
>>>> PipelineOptions wasn't really built with the idea of multiple distinct
>>>> environments in mind. One hack you could do is put one of the
>>>> environments in an expansion service with its own environment (as if
>>>> it were written in a different language) and configure that
>>>> environment separately.
>>>> 
>>>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> This is great, let me try it out.
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>>> 
>>>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>>>> 
>>>> beam:env:docker:v1 VS beam:env:docker:v11
>>>> 
>>>> Is this on the right track?
>>>> 
>>>> 
>>>> Yep.
>>>> 
>>>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>>>> 
>>>> 
>>>> It should already do the right thing here. That's how multi-language works.
>>>> 
>>>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218 <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218>
>>>> 
>>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>>> 
>>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>>>> 
>>>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>>>> 
>>>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>>>> 
>>>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>>>> 
>>>> Does this make sense ?
>>>> 
>>>> 
>>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>>> "platform" transforms) which could be used to mark their environments
>>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>>> resource hints (even a "dummy" hint like
>>>> "use_platform_environment=True") to force these into a separate docker
>>>> image as well.
>>>> 
>>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>>> 
>>>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>>>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>>>> 
>>>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>>>> 
>>>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>>>> 
>>>> 
>>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> Thanks for the advice.
>>>> 
>>>> Here are some more background:
>>>> 
>>>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>>>> 
>>>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>>>> 
>>>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>>>> 
>>>> Let me know your thoughts, any feedback/advice is welcome.
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>>> 
>>>> Resource hints have a limited use case and might fit your need.
>>>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>>>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>>>> 
>>>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>>>> 
>>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>>> 
>>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>> 
>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>>> 
>>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144 <https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144>
>>>> 
>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> 
>>>> Hello All,
>>>> 
>>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>>> 
>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>>> 
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> 
>>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344 <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344>
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
> 


Re: Multi Environment Support

Posted by Chamikara Jayalath <ch...@google.com>.
On Wed, Dec 15, 2021 at 10:38 AM Ke Wu <ke...@gmail.com> wrote:

> Thanks for Cham to chime in, having each expansion service instance to be
> able to serve a distinct environment which is configurable is what I am
> looking for, however, I don’t think it is practical atm.
>
> The pipeline option it uses to register environment is NOT the pipeline
> option of expansion service but the option of the pipeline that is created
> with default, i.e. not configurable.
>


AFAICT we get the options object to register the environment from the
Pipeline  here:
https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L527

We set a copy of the input options passed to the expansion service in the
pipeline object here:
https://github.com/apache/beam/blob/bc82f6f84e72c520be397977edf4bf645782ac8f/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L583

So probably you just need to make sure the options you need for configuring
the environment are copied over when creating "effectiveOpts". Which
specific options are you trying to use (assuming Java expansion service) ?

Thanks,
Cham


> On Dec 14, 2021, at 8:49 PM, Chamikara Jayalath <ch...@google.com>
> wrote:
>
>
>
> On Mon, Dec 13, 2021 at 11:12 AM Ke Wu <ke...@gmail.com> wrote:
>
>> Agreed, that is why I was hesitant to complicate ServerConfiguration such
>> that it can configure the expansion service it brings up.
>>
>> In this case, do you think we may just add one more option to disable the
>> expansion service setup so for complicated use cases, expansion service can
>> be brought up manually with desired options?
>>
>> Another question is, if we update java Expansion service to use its own
>> option instead of blindly uses pipeline’s option [1], will this cause any
>> backward compatibility concern?
>>
>
> I haven't fully followed the thread but this indeed will be a backwards
> incompatible change. I believe currently we assume each expansion service
> instance to serve a distinct environment. Also we artificially limit the
> set of PipelineOptions that apply performing the expansion [1] since
> arbitrary PipelineOptions specified through the ExpansionService do not get
> propagated to the SDK that submits the pipeline to the runner. But we can
> expand this list as needed so that the environment of each expansion
> service is fully configurable through PipelineOptions
>
> Thanks,
> Cham
>
> [1]
> https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572
>
>
>> Best,
>> Ke
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528
>>
>> pipeline.getOptions() -> this.pipelineOptions
>>
>> On Dec 13, 2021, at 10:40 AM, Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>> I think bringing up the expansion service as part of the job server is
>> just a convenience for the simple case.
>>
>> On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Awesome, I am happy to make the update.
>>
>> One more question, I noticed that expansion service is currently being
>> brought up as part of job server in Java [1], if this is the preferred
>> approach, does it mean, we should update ServerConfiguration to include
>> such configurations, like expansion server port [2]
>>
>> Best,
>> Ke
>>
>> [1]
>> https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58
>> [2]
>> https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88
>>
>>
>> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <ro...@google.com> wrote:
>>
>> An expansion service should be able to specify its own environment, as
>> is done in Python, not blindly inherit it from the caller. Java should
>> be updated to have the same flexibility.
>>
>> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Hi Robert,
>>
>> After some more digging and exploration, I realized that it is not clean
>> and straightforward to update PipelineOption to support such.
>>
>> Therefore, I spent more time exploring the external transform & expansion
>> service route. However, I noticed that this approach has the same
>> limitation where we cannot configure environments ourselves.
>>
>> This is because when ExpansionService register environment [1], it uses
>> the pipeline options of the original pipeline. This limitation not only
>> restrict our capability to configure a different external environment in
>> external mode but also restrict Docker mode to specify docker image for
>> different SDKs but use defaults.
>>
>> Is this expected or do you think ExpansionService should be updated to be
>> configurable of the environment it registers ?
>>
>> Best,
>> Ke
>>
>>
>> [1]
>>
>> Java:
>> https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526
>>
>> Python:
>>
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39
>>
>>
>>
>> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <ro...@google.com> wrote:
>>
>> Could you give a concrete example of what such a specification would look
>> like?
>>
>> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Hi Robert,
>>
>> Thanks for the pointer, using expansion service hack seems to work!
>>
>> On the other hand, since PipelineOptions is the place to configure
>> external service address anyway, do you think it makes sense to expand it
>> so it is capable of specifying multiple external environment to external
>> service address mapping?
>>
>> Best,
>> Ke
>>
>> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
>>
>> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> I have a quick follow up questions.
>>
>> When using multiple external environments, is there a way to configure
>> the multiple external service address? It looks like the current
>> PipelineOptions only supports specifying one external address.
>>
>>
>> PipelineOptions wasn't really built with the idea of multiple distinct
>> environments in mind. One hack you could do is put one of the
>> environments in an expansion service with its own environment (as if
>> it were written in a different language) and configure that
>> environment separately.
>>
>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
>>
>> This is great, let me try it out.
>>
>> Best,
>> Ke
>>
>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>>
>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> I am able to annotate/mark a java transform by setting its resource hints
>> [1] as well, which resulted in a different environment id, e.g.
>>
>> beam:env:docker:v1 VS beam:env:docker:v11
>>
>> Is this on the right track?
>>
>>
>> Yep.
>>
>> If Yes, I suppose then I need to configure job bundle factory to be able
>> to understand multiple environments and configure them separately as well.
>>
>>
>> It should already do the right thing here. That's how multi-language
>> works.
>>
>> [1]
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>>
>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Ideally, we do not want to expose anything directly to users and we, as
>> the framework and platform provider, separate things out under the hood.
>>
>> I would expect users to author their DoFn(s) in the same way as they do
>> right now, but we expect to change the DoFn(s) that we provide, will be
>> annotated/marked so that it can be recognized during runtime.
>>
>> In our use case, application is executed in Kubernetes environment
>> therefore, we are expecting to directly use different docker image to
>> isolate dependencies.
>>
>> e.g. we have docker image A, which is beam core, that is used to start
>> job server and runner process. We have a docker image B, which contains
>> DoFn(s) that platform provides to serve as a external worker pool service
>> to execute platform provided DoFn(s), last but not least, users would have
>> their own docker image represent their application, which will be used to
>> start the external worker pool service to handle their own UDF execution.
>>
>> Does this make sense ?
>>
>>
>> In Python it's pretty trivial to annotate transforms (e.g. the
>> "platform" transforms) which could be used to mark their environments
>> prior to optimization (e.g. fusion). As mentioned, you could use
>> resource hints (even a "dummy" hint like
>> "use_platform_environment=True") to force these into a separate docker
>> image as well.
>>
>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>>
>> That sounds neat. I think that before you try to figure out how to change
>> Beam to fit this usecase is to think about what would be the best way for
>> users to specify these requirements when they are constructing the
>> pipeline. Once you have some samples that you could share the community
>> would probably be able to give you more pointed advice.
>> For example will they be running one application with a complicated class
>> loader setup, if so then we could probably do away with multiple
>> environments and try to have DoFn's recognize their specific class loader
>> configuration and replicate it on the SDK harness side.
>>
>> Also, for performance reasons users may want to resolve their dependency
>> issues to create a maximally fused graph to limit performance impact due to
>> the encoding/decoding boundaries at the edges of those fused graphs.
>>
>> Finally, this could definitely apply to languages like Python and Go (now
>> that Go has support for modules) as dependency issues are a common problem.
>>
>>
>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Thanks for the advice.
>>
>> Here are some more background:
>>
>> We are building a feature called “split deployment” such that, we can
>> isolate framework/platform core from user code/dependencies to address
>> couple of operational challenges such as dependency conflict,
>> alert/exception triaging.
>>
>> With Beam’s portability framework, runner and sdk worker process
>> naturally decouples beam core and user UDFs(DoFn), which is awesome! On top
>> of this, we could further distinguish DoFn(s) that end user authors from
>> DoFn(s) that platform provides, therefore, we would like these DoFn(s) to
>> be executed in different environments, even in the same language, e.g. Java.
>>
>> Therefore, I am exploring approaches and recommendations what are the
>> proper way to do that.
>>
>> Let me know your thoughts, any feedback/advice is welcome.
>>
>> Best,
>> Ke
>>
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a
>> different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly
>> to choose which environment is used for which PTransform.
>>
>> Can you provide additional details as to why you would want to have two
>> separate java environments (e.g. incompatible versions of libraries)?
>>
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Thanks Luke for the reply, do you know what is the preferred way to
>> configure a PTransform to be executed in a different environment from
>> another PTransform when both are in the same SDK, e.g. Java ?
>>
>> Best,
>> Ke
>>
>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>
>> Environments that aren't exactly the same are already in separate
>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>
>> Workarounds like getOnlyEnvironmentId would need to be removed. It may
>> also be effectively dead-code.
>>
>> 1:
>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>
>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>
>> Hello All,
>>
>> We have a use case where in a java portable pipeline, we would like to
>> have multiple environments setup in order that some executable stage runs
>> in one environment while some other executable stages runs in another
>> environment. Couple of questions on this:
>>
>> 1. Is this current supported? I noticed a TODO in [1] which suggests it
>> is feature pending support
>> 2. If we did support it, what would the ideal mechanism to distinguish
>> ParDo/ExecutableStage to be executed in different environment, is it
>> through ResourceHints?
>>
>>
>> Best,
>> Ke
>>
>>
>> [1]
>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Thanks for Cham to chime in, having each expansion service instance to be able to serve a distinct environment which is configurable is what I am looking for, however, I don’t think it is practical atm.

The pipeline option it uses to register environment is NOT the pipeline option of expansion service but the option of the pipeline that is created with default, i.e. not configurable.

> On Dec 14, 2021, at 8:49 PM, Chamikara Jayalath <ch...@google.com> wrote:
> 
> 
> 
> On Mon, Dec 13, 2021 at 11:12 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
> Agreed, that is why I was hesitant to complicate ServerConfiguration such that it can configure the expansion service it brings up.
> 
> In this case, do you think we may just add one more option to disable the expansion service setup so for complicated use cases, expansion service can be brought up manually with desired options?
> 
> Another question is, if we update java Expansion service to use its own option instead of blindly uses pipeline’s option [1], will this cause any backward compatibility concern?
> 
> I haven't fully followed the thread but this indeed will be a backwards incompatible change. I believe currently we assume each expansion service instance to serve a distinct environment. Also we artificially limit the set of PipelineOptions that apply performing the expansion [1] since arbitrary PipelineOptions specified through the ExpansionService do not get propagated to the SDK that submits the pipeline to the runner. But we can expand this list as needed so that the environment of each expansion service is fully configurable through PipelineOptions
> 
> Thanks,
> Cham
> 
> [1] https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572 <https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572>
> 
> 
> Best,
> Ke
> 
> [1] https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528 <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528>
> 
> pipeline.getOptions() -> this.pipelineOptions
> 
>> On Dec 13, 2021, at 10:40 AM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>> 
>> I think bringing up the expansion service as part of the job server is
>> just a convenience for the simple case.
>> 
>> On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Awesome, I am happy to make the update.
>>> 
>>> One more question, I noticed that expansion service is currently being brought up as part of job server in Java [1], if this is the preferred approach, does it mean, we should update ServerConfiguration to include such configurations, like expansion server port [2]
>>> 
>>> Best,
>>> Ke
>>> 
>>> [1] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58 <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58>
>>> [2] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88 <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88>
>>> 
>>> 
>>> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>> 
>>> An expansion service should be able to specify its own environment, as
>>> is done in Python, not blindly inherit it from the caller. Java should
>>> be updated to have the same flexibility.
>>> 
>>> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> Hi Robert,
>>> 
>>> After some more digging and exploration, I realized that it is not clean and straightforward to update PipelineOption to support such.
>>> 
>>> Therefore, I spent more time exploring the external transform & expansion service route. However, I noticed that this approach has the same limitation where we cannot configure environments ourselves.
>>> 
>>> This is because when ExpansionService register environment [1], it uses the pipeline options of the original pipeline. This limitation not only restrict our capability to configure a different external environment in external mode but also restrict Docker mode to specify docker image for different SDKs but use defaults.
>>> 
>>> Is this expected or do you think ExpansionService should be updated to be configurable of the environment it registers ?
>>> 
>>> Best,
>>> Ke
>>> 
>>> 
>>> [1]
>>> 
>>> Java: https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526 <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526>
>>> 
>>> Python:
>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39 <https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39>
>>> 
>>> 
>>> 
>>> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>> 
>>> Could you give a concrete example of what such a specification would look like?
>>> 
>>> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> Hi Robert,
>>> 
>>> Thanks for the pointer, using expansion service hack seems to work!
>>> 
>>> On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>> 
>>> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> I have a quick follow up questions.
>>> 
>>> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
>>> 
>>> 
>>> PipelineOptions wasn't really built with the idea of multiple distinct
>>> environments in mind. One hack you could do is put one of the
>>> environments in an expansion service with its own environment (as if
>>> it were written in a different language) and configure that
>>> environment separately.
>>> 
>>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> This is great, let me try it out.
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>> 
>>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>>> 
>>> beam:env:docker:v1 VS beam:env:docker:v11
>>> 
>>> Is this on the right track?
>>> 
>>> 
>>> Yep.
>>> 
>>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>>> 
>>> 
>>> It should already do the right thing here. That's how multi-language works.
>>> 
>>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218 <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218>
>>> 
>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <robertwb@google.com <ma...@google.com>> wrote:
>>> 
>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>>> 
>>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>>> 
>>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>>> 
>>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>>> 
>>> Does this make sense ?
>>> 
>>> 
>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>> "platform" transforms) which could be used to mark their environments
>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>> resource hints (even a "dummy" hint like
>>> "use_platform_environment=True") to force these into a separate docker
>>> image as well.
>>> 
>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>> 
>>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>>> 
>>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>>> 
>>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>>> 
>>> 
>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> Thanks for the advice.
>>> 
>>> Here are some more background:
>>> 
>>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>>> 
>>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>>> 
>>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>>> 
>>> Let me know your thoughts, any feedback/advice is welcome.
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>> 
>>> Resource hints have a limited use case and might fit your need.
>>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>>> 
>>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>>> 
>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>> 
>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>> 
>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>> 
>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144 <https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144>
>>> 
>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> 
>>> Hello All,
>>> 
>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>> 
>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>> 
>>> 
>>> Best,
>>> Ke
>>> 
>>> 
>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344 <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344>
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
> 


Re: Multi Environment Support

Posted by Chamikara Jayalath <ch...@google.com>.
On Mon, Dec 13, 2021 at 11:12 AM Ke Wu <ke...@gmail.com> wrote:

> Agreed, that is why I was hesitant to complicate ServerConfiguration such
> that it can configure the expansion service it brings up.
>
> In this case, do you think we may just add one more option to disable the
> expansion service setup so for complicated use cases, expansion service can
> be brought up manually with desired options?
>
> Another question is, if we update java Expansion service to use its own
> option instead of blindly uses pipeline’s option [1], will this cause any
> backward compatibility concern?
>

I haven't fully followed the thread but this indeed will be a backwards
incompatible change. I believe currently we assume each expansion service
instance to serve a distinct environment. Also we artificially limit the
set of PipelineOptions that apply performing the expansion [1] since
arbitrary PipelineOptions specified through the ExpansionService do not get
propagated to the SDK that submits the pipeline to the runner. But we can
expand this list as needed so that the environment of each expansion
service is fully configurable through PipelineOptions

Thanks,
Cham

[1]
https://github.com/apache/beam/blob/f9b0d6d6290436ff2a357fcebff8a281cc383025/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L572


> Best,
> Ke
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528
>
> pipeline.getOptions() -> this.pipelineOptions
>
> On Dec 13, 2021, at 10:40 AM, Robert Bradshaw <ro...@google.com> wrote:
>
> I think bringing up the expansion service as part of the job server is
> just a convenience for the simple case.
>
> On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Awesome, I am happy to make the update.
>
> One more question, I noticed that expansion service is currently being
> brought up as part of job server in Java [1], if this is the preferred
> approach, does it mean, we should update ServerConfiguration to include
> such configurations, like expansion server port [2]
>
> Best,
> Ke
>
> [1]
> https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58
> [2]
> https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88
>
>
> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> An expansion service should be able to specify its own environment, as
> is done in Python, not blindly inherit it from the caller. Java should
> be updated to have the same flexibility.
>
> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hi Robert,
>
> After some more digging and exploration, I realized that it is not clean
> and straightforward to update PipelineOption to support such.
>
> Therefore, I spent more time exploring the external transform & expansion
> service route. However, I noticed that this approach has the same
> limitation where we cannot configure environments ourselves.
>
> This is because when ExpansionService register environment [1], it uses
> the pipeline options of the original pipeline. This limitation not only
> restrict our capability to configure a different external environment in
> external mode but also restrict Docker mode to specify docker image for
> different SDKs but use defaults.
>
> Is this expected or do you think ExpansionService should be updated to be
> configurable of the environment it registers ?
>
> Best,
> Ke
>
>
> [1]
>
> Java:
> https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526
>
> Python:
>
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39
>
>
>
> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> Could you give a concrete example of what such a specification would look
> like?
>
> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hi Robert,
>
> Thanks for the pointer, using expansion service hack seems to work!
>
> On the other hand, since PipelineOptions is the place to configure
> external service address anyway, do you think it makes sense to expand it
> so it is capable of specifying multiple external environment to external
> service address mapping?
>
> Best,
> Ke
>
> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> I have a quick follow up questions.
>
> When using multiple external environments, is there a way to configure the
> multiple external service address? It looks like the current
> PipelineOptions only supports specifying one external address.
>
>
> PipelineOptions wasn't really built with the idea of multiple distinct
> environments in mind. One hack you could do is put one of the
> environments in an expansion service with its own environment (as if
> it were written in a different language) and configure that
> environment separately.
>
> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
>
> This is great, let me try it out.
>
> Best,
> Ke
>
> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> I am able to annotate/mark a java transform by setting its resource hints
> [1] as well, which resulted in a different environment id, e.g.
>
> beam:env:docker:v1 VS beam:env:docker:v11
>
> Is this on the right track?
>
>
> Yep.
>
> If Yes, I suppose then I need to configure job bundle factory to be able
> to understand multiple environments and configure them separately as well.
>
>
> It should already do the right thing here. That's how multi-language works.
>
> [1]
> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>
> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Ideally, we do not want to expose anything directly to users and we, as
> the framework and platform provider, separate things out under the hood.
>
> I would expect users to author their DoFn(s) in the same way as they do
> right now, but we expect to change the DoFn(s) that we provide, will be
> annotated/marked so that it can be recognized during runtime.
>
> In our use case, application is executed in Kubernetes environment
> therefore, we are expecting to directly use different docker image to
> isolate dependencies.
>
> e.g. we have docker image A, which is beam core, that is used to start job
> server and runner process. We have a docker image B, which contains DoFn(s)
> that platform provides to serve as a external worker pool service to
> execute platform provided DoFn(s), last but not least, users would have
> their own docker image represent their application, which will be used to
> start the external worker pool service to handle their own UDF execution.
>
> Does this make sense ?
>
>
> In Python it's pretty trivial to annotate transforms (e.g. the
> "platform" transforms) which could be used to mark their environments
> prior to optimization (e.g. fusion). As mentioned, you could use
> resource hints (even a "dummy" hint like
> "use_platform_environment=True") to force these into a separate docker
> image as well.
>
> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>
> That sounds neat. I think that before you try to figure out how to change
> Beam to fit this usecase is to think about what would be the best way for
> users to specify these requirements when they are constructing the
> pipeline. Once you have some samples that you could share the community
> would probably be able to give you more pointed advice.
> For example will they be running one application with a complicated class
> loader setup, if so then we could probably do away with multiple
> environments and try to have DoFn's recognize their specific class loader
> configuration and replicate it on the SDK harness side.
>
> Also, for performance reasons users may want to resolve their dependency
> issues to create a maximally fused graph to limit performance impact due to
> the encoding/decoding boundaries at the edges of those fused graphs.
>
> Finally, this could definitely apply to languages like Python and Go (now
> that Go has support for modules) as dependency issues are a common problem.
>
>
> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Thanks for the advice.
>
> Here are some more background:
>
> We are building a feature called “split deployment” such that, we can
> isolate framework/platform core from user code/dependencies to address
> couple of operational challenges such as dependency conflict,
> alert/exception triaging.
>
> With Beam’s portability framework, runner and sdk worker process naturally
> decouples beam core and user UDFs(DoFn), which is awesome! On top of this,
> we could further distinguish DoFn(s) that end user authors from DoFn(s)
> that platform provides, therefore, we would like these DoFn(s) to be
> executed in different environments, even in the same language, e.g. Java.
>
> Therefore, I am exploring approaches and recommendations what are the
> proper way to do that.
>
> Let me know your thoughts, any feedback/advice is welcome.
>
> Best,
> Ke
>
> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>
> Resource hints have a limited use case and might fit your need.
> You could also try to use the expansion service XLang route to bring in a
> different Java environment.
> Finally, you could modify the pipeline proto that is generated directly to
> choose which environment is used for which PTransform.
>
> Can you provide additional details as to why you would want to have two
> separate java environments (e.g. incompatible versions of libraries)?
>
> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Thanks Luke for the reply, do you know what is the preferred way to
> configure a PTransform to be executed in a different environment from
> another PTransform when both are in the same SDK, e.g. Java ?
>
> Best,
> Ke
>
> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>
> Environments that aren't exactly the same are already in separate
> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>
> Workarounds like getOnlyEnvironmentId would need to be removed. It may
> also be effectively dead-code.
>
> 1:
> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>
> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hello All,
>
> We have a use case where in a java portable pipeline, we would like to
> have multiple environments setup in order that some executable stage runs
> in one environment while some other executable stages runs in another
> environment. Couple of questions on this:
>
> 1. Is this current supported? I noticed a TODO in [1] which suggests it is
> feature pending support
> 2. If we did support it, what would the ideal mechanism to distinguish
> ParDo/ExecutableStage to be executed in different environment, is it
> through ResourceHints?
>
>
> Best,
> Ke
>
>
> [1]
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>
>
>
>
>
>
>
>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Agreed, that is why I was hesitant to complicate ServerConfiguration such that it can configure the expansion service it brings up.

In this case, do you think we may just add one more option to disable the expansion service setup so for complicated use cases, expansion service can be brought up manually with desired options?

Another question is, if we update java Expansion service to use its own option instead of blindly uses pipeline’s option [1], will this cause any backward compatibility concern?

Best,
Ke

[1] https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528 <https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L528>

pipeline.getOptions() -> this.pipelineOptions

> On Dec 13, 2021, at 10:40 AM, Robert Bradshaw <ro...@google.com> wrote:
> 
> I think bringing up the expansion service as part of the job server is
> just a convenience for the simple case.
> 
> On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> Awesome, I am happy to make the update.
>> 
>> One more question, I noticed that expansion service is currently being brought up as part of job server in Java [1], if this is the preferred approach, does it mean, we should update ServerConfiguration to include such configurations, like expansion server port [2]
>> 
>> Best,
>> Ke
>> 
>> [1] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58
>> [2] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88
>> 
>> 
>> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> An expansion service should be able to specify its own environment, as
>> is done in Python, not blindly inherit it from the caller. Java should
>> be updated to have the same flexibility.
>> 
>> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Hi Robert,
>> 
>> After some more digging and exploration, I realized that it is not clean and straightforward to update PipelineOption to support such.
>> 
>> Therefore, I spent more time exploring the external transform & expansion service route. However, I noticed that this approach has the same limitation where we cannot configure environments ourselves.
>> 
>> This is because when ExpansionService register environment [1], it uses the pipeline options of the original pipeline. This limitation not only restrict our capability to configure a different external environment in external mode but also restrict Docker mode to specify docker image for different SDKs but use defaults.
>> 
>> Is this expected or do you think ExpansionService should be updated to be configurable of the environment it registers ?
>> 
>> Best,
>> Ke
>> 
>> 
>> [1]
>> 
>> Java: https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526
>> 
>> Python:
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39
>> 
>> 
>> 
>> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> Could you give a concrete example of what such a specification would look like?
>> 
>> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Hi Robert,
>> 
>> Thanks for the pointer, using expansion service hack seems to work!
>> 
>> On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?
>> 
>> Best,
>> Ke
>> 
>> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> I have a quick follow up questions.
>> 
>> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
>> 
>> 
>> PipelineOptions wasn't really built with the idea of multiple distinct
>> environments in mind. One hack you could do is put one of the
>> environments in an expansion service with its own environment (as if
>> it were written in a different language) and configure that
>> environment separately.
>> 
>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
>> 
>> This is great, let me try it out.
>> 
>> Best,
>> Ke
>> 
>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>> 
>> beam:env:docker:v1 VS beam:env:docker:v11
>> 
>> Is this on the right track?
>> 
>> 
>> Yep.
>> 
>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>> 
>> 
>> It should already do the right thing here. That's how multi-language works.
>> 
>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>> 
>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>> 
>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>> 
>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>> 
>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>> 
>> Does this make sense ?
>> 
>> 
>> In Python it's pretty trivial to annotate transforms (e.g. the
>> "platform" transforms) which could be used to mark their environments
>> prior to optimization (e.g. fusion). As mentioned, you could use
>> resource hints (even a "dummy" hint like
>> "use_platform_environment=True") to force these into a separate docker
>> image as well.
>> 
>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>> 
>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>> 
>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>> 
>> 
>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Thanks for the advice.
>> 
>> Here are some more background:
>> 
>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>> 
>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>> 
>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>> 
>> Let me know your thoughts, any feedback/advice is welcome.
>> 
>> Best,
>> Ke
>> 
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>> 
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>> 
>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>> 
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>> 
>> Best,
>> Ke
>> 
>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>> 
>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>> 
>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>> 
>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Hello All,
>> 
>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>> 
>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>> 
>> 
>> Best,
>> Ke
>> 
>> 
>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>> 
>> 
>> 
>> 
>> 
>> 
>> 


Re: Multi Environment Support

Posted by Robert Bradshaw <ro...@google.com>.
I think bringing up the expansion service as part of the job server is
just a convenience for the simple case.

On Mon, Dec 13, 2021 at 9:54 AM Ke Wu <ke...@gmail.com> wrote:
>
> Awesome, I am happy to make the update.
>
> One more question, I noticed that expansion service is currently being brought up as part of job server in Java [1], if this is the preferred approach, does it mean, we should update ServerConfiguration to include such configurations, like expansion server port [2]
>
> Best,
> Ke
>
> [1] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58
> [2] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88
>
>
> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> An expansion service should be able to specify its own environment, as
> is done in Python, not blindly inherit it from the caller. Java should
> be updated to have the same flexibility.
>
> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hi Robert,
>
> After some more digging and exploration, I realized that it is not clean and straightforward to update PipelineOption to support such.
>
> Therefore, I spent more time exploring the external transform & expansion service route. However, I noticed that this approach has the same limitation where we cannot configure environments ourselves.
>
> This is because when ExpansionService register environment [1], it uses the pipeline options of the original pipeline. This limitation not only restrict our capability to configure a different external environment in external mode but also restrict Docker mode to specify docker image for different SDKs but use defaults.
>
> Is this expected or do you think ExpansionService should be updated to be configurable of the environment it registers ?
>
> Best,
> Ke
>
>
> [1]
>
> Java: https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526
>
> Python:
> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39
>
>
>
> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> Could you give a concrete example of what such a specification would look like?
>
> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hi Robert,
>
> Thanks for the pointer, using expansion service hack seems to work!
>
> On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?
>
> Best,
> Ke
>
> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> I have a quick follow up questions.
>
> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
>
>
> PipelineOptions wasn't really built with the idea of multiple distinct
> environments in mind. One hack you could do is put one of the
> environments in an expansion service with its own environment (as if
> it were written in a different language) and configure that
> environment separately.
>
> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
>
> This is great, let me try it out.
>
> Best,
> Ke
>
> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>
> beam:env:docker:v1 VS beam:env:docker:v11
>
> Is this on the right track?
>
>
> Yep.
>
> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>
>
> It should already do the right thing here. That's how multi-language works.
>
> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>
> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>
> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>
> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>
> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>
> Does this make sense ?
>
>
> In Python it's pretty trivial to annotate transforms (e.g. the
> "platform" transforms) which could be used to mark their environments
> prior to optimization (e.g. fusion). As mentioned, you could use
> resource hints (even a "dummy" hint like
> "use_platform_environment=True") to force these into a separate docker
> image as well.
>
> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>
> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>
> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>
> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>
>
> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Thanks for the advice.
>
> Here are some more background:
>
> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>
> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>
> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>
> Let me know your thoughts, any feedback/advice is welcome.
>
> Best,
> Ke
>
> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>
> Resource hints have a limited use case and might fit your need.
> You could also try to use the expansion service XLang route to bring in a different Java environment.
> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>
> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>
> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>
> Best,
> Ke
>
> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>
> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>
> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>
> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>
> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hello All,
>
> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>
> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>
>
> Best,
> Ke
>
>
> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>
>
>
>
>
>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Awesome, I am happy to make the update.

One more question, I noticed that expansion service is currently being brought up as part of job server in Java [1], if this is the preferred approach, does it mean, we should update ServerConfiguration to include such configurations, like expansion server port [2]

Best,
Ke

[1] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58 <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L58> 
[2] https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88 <https://github.com/apache/beam/blob/master/runners/java-job-service/src/main/java/org/apache/beam/runners/jobsubmission/JobServerDriver.java#L88> 


> On Dec 9, 2021, at 4:34 PM, Robert Bradshaw <ro...@google.com> wrote:
> 
> An expansion service should be able to specify its own environment, as
> is done in Python, not blindly inherit it from the caller. Java should
> be updated to have the same flexibility.
> 
> On Thu, Dec 9, 2021 at 4:27 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> Hi Robert,
>> 
>> After some more digging and exploration, I realized that it is not clean and straightforward to update PipelineOption to support such.
>> 
>> Therefore, I spent more time exploring the external transform & expansion service route. However, I noticed that this approach has the same limitation where we cannot configure environments ourselves.
>> 
>> This is because when ExpansionService register environment [1], it uses the pipeline options of the original pipeline. This limitation not only restrict our capability to configure a different external environment in external mode but also restrict Docker mode to specify docker image for different SDKs but use defaults.
>> 
>> Is this expected or do you think ExpansionService should be updated to be configurable of the environment it registers ?
>> 
>> Best,
>> Ke
>> 
>> 
>> [1]
>> 
>> Java: https://github.com/apache/beam/blob/master/sdks/java/expansion-service/src/main/java/org/apache/beam/sdk/expansion/service/ExpansionService.java#L526
>> 
>> Python:
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/runners/portability/expansion_service.py#L39
>> 
>> 
>> 
>> On Nov 17, 2021, at 1:02 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> Could you give a concrete example of what such a specification would look like?
>> 
>> On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Hi Robert,
>> 
>> Thanks for the pointer, using expansion service hack seems to work!
>> 
>> On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?
>> 
>> Best,
>> Ke
>> 
>> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> I have a quick follow up questions.
>> 
>> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
>> 
>> 
>> PipelineOptions wasn't really built with the idea of multiple distinct
>> environments in mind. One hack you could do is put one of the
>> environments in an expansion service with its own environment (as if
>> it were written in a different language) and configure that
>> environment separately.
>> 
>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
>> 
>> This is great, let me try it out.
>> 
>> Best,
>> Ke
>> 
>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>> 
>> beam:env:docker:v1 VS beam:env:docker:v11
>> 
>> Is this on the right track?
>> 
>> 
>> Yep.
>> 
>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>> 
>> 
>> It should already do the right thing here. That's how multi-language works.
>> 
>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>> 
>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>> 
>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>> 
>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>> 
>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>> 
>> Does this make sense ?
>> 
>> 
>> In Python it's pretty trivial to annotate transforms (e.g. the
>> "platform" transforms) which could be used to mark their environments
>> prior to optimization (e.g. fusion). As mentioned, you could use
>> resource hints (even a "dummy" hint like
>> "use_platform_environment=True") to force these into a separate docker
>> image as well.
>> 
>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>> 
>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>> 
>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>> 
>> 
>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Thanks for the advice.
>> 
>> Here are some more background:
>> 
>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>> 
>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>> 
>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>> 
>> Let me know your thoughts, any feedback/advice is welcome.
>> 
>> Best,
>> Ke
>> 
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>> 
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>> 
>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>> 
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>> 
>> Best,
>> Ke
>> 
>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>> 
>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>> 
>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>> 
>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Hello All,
>> 
>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>> 
>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>> 
>> 
>> Best,
>> Ke
>> 
>> 
>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>> 
>> 
>> 
>> 
>> 
>> 


Re: Multi Environment Support

Posted by Robert Bradshaw <ro...@google.com>.
Could you give a concrete example of what such a specification would look like?

On Wed, Nov 17, 2021 at 11:05 AM Ke Wu <ke...@gmail.com> wrote:
>
> Hi Robert,
>
> Thanks for the pointer, using expansion service hack seems to work!
>
> On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?
>
> Best,
> Ke
>
> > On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
> >
> > On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
> >>
> >> I have a quick follow up questions.
> >>
> >> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
> >
> > PipelineOptions wasn't really built with the idea of multiple distinct
> > environments in mind. One hack you could do is put one of the
> > environments in an expansion service with its own environment (as if
> > it were written in a different language) and configure that
> > environment separately.
> >
> >>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
> >>>
> >>> This is great, let me try it out.
> >>>
> >>> Best,
> >>> Ke
> >>>
> >>>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
> >>>>
> >>>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
> >>>>>
> >>>>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
> >>>>>
> >>>>> beam:env:docker:v1 VS beam:env:docker:v11
> >>>>>
> >>>>> Is this on the right track?
> >>>>
> >>>> Yep.
> >>>>
> >>>>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
> >>>>
> >>>> It should already do the right thing here. That's how multi-language works.
> >>>>
> >>>>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
> >>>>>
> >>>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
> >>>>>
> >>>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
> >>>>>
> >>>>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
> >>>>>
> >>>>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
> >>>>>
> >>>>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
> >>>>>
> >>>>> Does this make sense ?
> >>>>>
> >>>>>
> >>>>> In Python it's pretty trivial to annotate transforms (e.g. the
> >>>>> "platform" transforms) which could be used to mark their environments
> >>>>> prior to optimization (e.g. fusion). As mentioned, you could use
> >>>>> resource hints (even a "dummy" hint like
> >>>>> "use_platform_environment=True") to force these into a separate docker
> >>>>> image as well.
> >>>>>
> >>>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
> >>>>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
> >>>>>
> >>>>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
> >>>>>
> >>>>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
> >>>>>
> >>>>>
> >>>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> Thanks for the advice.
> >>>>>
> >>>>> Here are some more background:
> >>>>>
> >>>>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
> >>>>>
> >>>>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
> >>>>>
> >>>>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
> >>>>>
> >>>>> Let me know your thoughts, any feedback/advice is welcome.
> >>>>>
> >>>>> Best,
> >>>>> Ke
> >>>>>
> >>>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>> Resource hints have a limited use case and might fit your need.
> >>>>> You could also try to use the expansion service XLang route to bring in a different Java environment.
> >>>>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
> >>>>>
> >>>>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
> >>>>>
> >>>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
> >>>>>
> >>>>> Best,
> >>>>> Ke
> >>>>>
> >>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
> >>>>>
> >>>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
> >>>>>
> >>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
> >>>>>
> >>>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
> >>>>>
> >>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
> >>>>>
> >>>>>
> >>>>> Hello All,
> >>>>>
> >>>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
> >>>>>
> >>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
> >>>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
> >>>>>
> >>>>>
> >>>>> Best,
> >>>>> Ke
> >>>>>
> >>>>>
> >>>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
> >>>>>
> >>>>>
> >>>
> >>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Hi Robert,

Thanks for the pointer, using expansion service hack seems to work!

On the other hand, since PipelineOptions is the place to configure external service address anyway, do you think it makes sense to expand it so it is capable of specifying multiple external environment to external service address mapping?

Best,
Ke

> On Oct 6, 2021, at 2:09 PM, Robert Bradshaw <ro...@google.com> wrote:
> 
> On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> I have a quick follow up questions.
>> 
>> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.
> 
> PipelineOptions wasn't really built with the idea of multiple distinct
> environments in mind. One hack you could do is put one of the
> environments in an expansion service with its own environment (as if
> it were written in a different language) and configure that
> environment separately.
> 
>>> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> This is great, let me try it out.
>>> 
>>> Best,
>>> Ke
>>> 
>>>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>>>> 
>>>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>>>>> 
>>>>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>>>>> 
>>>>> beam:env:docker:v1 VS beam:env:docker:v11
>>>>> 
>>>>> Is this on the right track?
>>>> 
>>>> Yep.
>>>> 
>>>>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>>>> 
>>>> It should already do the right thing here. That's how multi-language works.
>>>> 
>>>>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>>>>> 
>>>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>>>>> 
>>>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>>>>> 
>>>>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>>>>> 
>>>>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>>>>> 
>>>>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>>>>> 
>>>>> Does this make sense ?
>>>>> 
>>>>> 
>>>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>>>> "platform" transforms) which could be used to mark their environments
>>>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>>>> resource hints (even a "dummy" hint like
>>>>> "use_platform_environment=True") to force these into a separate docker
>>>>> image as well.
>>>>> 
>>>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>>>>> 
>>>>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>>>>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>>>>> 
>>>>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>>>>> 
>>>>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>>>>> 
>>>>> 
>>>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Thanks for the advice.
>>>>> 
>>>>> Here are some more background:
>>>>> 
>>>>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>>>>> 
>>>>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>>>>> 
>>>>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>>>>> 
>>>>> Let me know your thoughts, any feedback/advice is welcome.
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>>>> 
>>>>> Resource hints have a limited use case and might fit your need.
>>>>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>>>>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>>>>> 
>>>>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>>>>> 
>>>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>>> 
>>>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>>> 
>>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>>>> 
>>>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>>> 
>>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>>>> 
>>>>> 
>>>>> Hello All,
>>>>> 
>>>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>>>> 
>>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> 
>>>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>>> 
>>>>> 
>>> 
>> 


Re: Multi Environment Support

Posted by Robert Bradshaw <ro...@google.com>.
On Wed, Oct 6, 2021 at 1:12 PM Ke Wu <ke...@gmail.com> wrote:
>
> I have a quick follow up questions.
>
> When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.

PipelineOptions wasn't really built with the idea of multiple distinct
environments in mind. One hack you could do is put one of the
environments in an expansion service with its own environment (as if
it were written in a different language) and configure that
environment separately.

> > On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
> >
> > This is great, let me try it out.
> >
> > Best,
> > Ke
> >
> >> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
> >>
> >> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
> >>>
> >>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
> >>>
> >>> beam:env:docker:v1 VS beam:env:docker:v11
> >>>
> >>> Is this on the right track?
> >>
> >> Yep.
> >>
> >>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
> >>
> >> It should already do the right thing here. That's how multi-language works.
> >>
> >>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
> >>>
> >>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
> >>>
> >>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
> >>>
> >>>
> >>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
> >>>
> >>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
> >>>
> >>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
> >>>
> >>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
> >>>
> >>> Does this make sense ?
> >>>
> >>>
> >>> In Python it's pretty trivial to annotate transforms (e.g. the
> >>> "platform" transforms) which could be used to mark their environments
> >>> prior to optimization (e.g. fusion). As mentioned, you could use
> >>> resource hints (even a "dummy" hint like
> >>> "use_platform_environment=True") to force these into a separate docker
> >>> image as well.
> >>>
> >>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
> >>>
> >>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
> >>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
> >>>
> >>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
> >>>
> >>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
> >>>
> >>>
> >>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
> >>>
> >>>
> >>> Thanks for the advice.
> >>>
> >>> Here are some more background:
> >>>
> >>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
> >>>
> >>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
> >>>
> >>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
> >>>
> >>> Let me know your thoughts, any feedback/advice is welcome.
> >>>
> >>> Best,
> >>> Ke
> >>>
> >>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
> >>>
> >>> Resource hints have a limited use case and might fit your need.
> >>> You could also try to use the expansion service XLang route to bring in a different Java environment.
> >>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
> >>>
> >>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
> >>>
> >>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
> >>>
> >>>
> >>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
> >>>
> >>> Best,
> >>> Ke
> >>>
> >>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
> >>>
> >>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
> >>>
> >>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
> >>>
> >>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
> >>>
> >>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
> >>>
> >>>
> >>> Hello All,
> >>>
> >>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
> >>>
> >>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
> >>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
> >>>
> >>>
> >>> Best,
> >>> Ke
> >>>
> >>>
> >>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
> >>>
> >>>
> >
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
I have a quick follow up questions. 

When using multiple external environments, is there a way to configure the multiple external service address? It looks like the current PipelineOptions only supports specifying one external address.

Best,
Ke

> On Oct 4, 2021, at 4:12 PM, Ke Wu <ke...@gmail.com> wrote:
> 
> This is great, let me try it out.
> 
> Best,
> Ke
> 
>> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>>> 
>>> beam:env:docker:v1 VS beam:env:docker:v11
>>> 
>>> Is this on the right track?
>> 
>> Yep.
>> 
>>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
>> 
>> It should already do the right thing here. That's how multi-language works.
>> 
>>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>>> 
>>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>>> 
>>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> 
>>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>>> 
>>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>>> 
>>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>>> 
>>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>>> 
>>> Does this make sense ?
>>> 
>>> 
>>> In Python it's pretty trivial to annotate transforms (e.g. the
>>> "platform" transforms) which could be used to mark their environments
>>> prior to optimization (e.g. fusion). As mentioned, you could use
>>> resource hints (even a "dummy" hint like
>>> "use_platform_environment=True") to force these into a separate docker
>>> image as well.
>>> 
>>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>>> 
>>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>>> 
>>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>>> 
>>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>>> 
>>> 
>>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> 
>>> Thanks for the advice.
>>> 
>>> Here are some more background:
>>> 
>>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>>> 
>>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>>> 
>>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>>> 
>>> Let me know your thoughts, any feedback/advice is welcome.
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>> 
>>> Resource hints have a limited use case and might fit your need.
>>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>>> 
>>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>>> 
>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> 
>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>> 
>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>> 
>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>> 
>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>> 
>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> 
>>> Hello All,
>>> 
>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>> 
>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>> 
>>> 
>>> Best,
>>> Ke
>>> 
>>> 
>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>> 
>>> 
> 


Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
This is great, let me try it out.

Best,
Ke

> On Sep 30, 2021, at 6:06 PM, Robert Bradshaw <ro...@google.com> wrote:
> 
> On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>> 
>> beam:env:docker:v1 VS beam:env:docker:v11
>> 
>> Is this on the right track?
> 
> Yep.
> 
>> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.
> 
> It should already do the right thing here. That's how multi-language works.
> 
>> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>> 
>> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>> 
>> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>> 
>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>> 
>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>> 
>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>> 
>> Does this make sense ?
>> 
>> 
>> In Python it's pretty trivial to annotate transforms (e.g. the
>> "platform" transforms) which could be used to mark their environments
>> prior to optimization (e.g. fusion). As mentioned, you could use
>> resource hints (even a "dummy" hint like
>> "use_platform_environment=True") to force these into a separate docker
>> image as well.
>> 
>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>> 
>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>> 
>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>> 
>> 
>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Thanks for the advice.
>> 
>> Here are some more background:
>> 
>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>> 
>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>> 
>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>> 
>> Let me know your thoughts, any feedback/advice is welcome.
>> 
>> Best,
>> Ke
>> 
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>> 
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>> 
>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>> 
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>> 
>> Best,
>> Ke
>> 
>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>> 
>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>> 
>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>> 
>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>> 
>> 
>> Hello All,
>> 
>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>> 
>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>> 
>> 
>> Best,
>> Ke
>> 
>> 
>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>> 
>> 


Re: Multi Environment Support

Posted by Robert Bradshaw <ro...@google.com>.
On Thu, Sep 30, 2021 at 6:00 PM Ke Wu <ke...@gmail.com> wrote:
>
> I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g.
>
> beam:env:docker:v1 VS beam:env:docker:v11
>
> Is this on the right track?

Yep.

> If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well.

It should already do the right thing here. That's how multi-language works.

> [1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218
>
> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
>
> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>
> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>
> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>
> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>
> Does this make sense ?
>
>
> In Python it's pretty trivial to annotate transforms (e.g. the
> "platform" transforms) which could be used to mark their environments
> prior to optimization (e.g. fusion). As mentioned, you could use
> resource hints (even a "dummy" hint like
> "use_platform_environment=True") to force these into a separate docker
> image as well.
>
> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>
> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>
> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>
> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>
>
> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>
>
> Thanks for the advice.
>
> Here are some more background:
>
> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>
> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>
> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>
> Let me know your thoughts, any feedback/advice is welcome.
>
> Best,
> Ke
>
> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>
> Resource hints have a limited use case and might fit your need.
> You could also try to use the expansion service XLang route to bring in a different Java environment.
> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>
> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>
> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>
> Best,
> Ke
>
> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>
> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>
> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>
> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>
> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>
>
> Hello All,
>
> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>
> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>
>
> Best,
> Ke
>
>
> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
I am able to annotate/mark a java transform by setting its resource hints [1] as well, which resulted in a different environment id, e.g. 

beam:env:docker:v1 VS beam:env:docker:v11


Is this on the right track? If Yes, I suppose then I need to configure job bundle factory to be able to understand multiple environments and configure them separately as well. 

[1] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218 <https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/PTransform.java#L218> 

> On Sep 30, 2021, at 10:34 AM, Robert Bradshaw <ro...@google.com> wrote:
> 
> On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>> 
>> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>> 
>> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>> 
>> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>> 
>> Does this make sense ?
> 
> In Python it's pretty trivial to annotate transforms (e.g. the
> "platform" transforms) which could be used to mark their environments
> prior to optimization (e.g. fusion). As mentioned, you could use
> resource hints (even a "dummy" hint like
> "use_platform_environment=True") to force these into a separate docker
> image as well.
> 
>> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>> 
>> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
>> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>> 
>> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>> 
>> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>> 
>> 
>> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>> 
>>> Thanks for the advice.
>>> 
>>> Here are some more background:
>>> 
>>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>>> 
>>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>>> 
>>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>>> 
>>> Let me know your thoughts, any feedback/advice is welcome.
>>> 
>>> Best,
>>> Ke
>>> 
>>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>> 
>>> Resource hints have a limited use case and might fit your need.
>>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>>> 
>>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>>> 
>>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>>> 
>>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>>>> 
>>>> Best,
>>>> Ke
>>>> 
>>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>> 
>>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>> 
>>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>>> 
>>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>> 
>>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>>>> 
>>>>> Hello All,
>>>>> 
>>>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>>>> 
>>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>>>> 
>>>>> 
>>>>> Best,
>>>>> Ke
>>>>> 
>>>>> 
>>>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344


Re: Multi Environment Support

Posted by Robert Bradshaw <ro...@google.com>.
On Thu, Sep 30, 2021 at 9:25 AM Ke Wu <ke...@gmail.com> wrote:
>
> Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood.
>
> I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.
>
> In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies.
>
> e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.
>
> Does this make sense ?

In Python it's pretty trivial to annotate transforms (e.g. the
"platform" transforms) which could be used to mark their environments
prior to optimization (e.g. fusion). As mentioned, you could use
resource hints (even a "dummy" hint like
"use_platform_environment=True") to force these into a separate docker
image as well.

> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
>
> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
>
> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
>
> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
>
>
> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:
>>
>> Thanks for the advice.
>>
>> Here are some more background:
>>
>> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
>>
>> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
>>
>> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
>>
>> Let me know your thoughts, any feedback/advice is welcome.
>>
>> Best,
>> Ke
>>
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>>
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>>
>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>>
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>>>
>>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>>>
>>> Best,
>>> Ke
>>>
>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>>
>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>>
>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>>
>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>>
>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>>>
>>>> Hello All,
>>>>
>>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>>>
>>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>>>
>>>>
>>>> Best,
>>>> Ke
>>>>
>>>>
>>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>
>>>
>>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Ideally, we do not want to expose anything directly to users and we, as the framework and platform provider, separate things out under the hood. 

I would expect users to author their DoFn(s) in the same way as they do right now, but we expect to change the DoFn(s) that we provide, will be annotated/marked so that it can be recognized during runtime.

In our use case, application is executed in Kubernetes environment therefore, we are expecting to directly use different docker image to isolate dependencies. 

e.g. we have docker image A, which is beam core, that is used to start job server and runner process. We have a docker image B, which contains DoFn(s) that platform provides to serve as a external worker pool service to execute platform provided DoFn(s), last but not least, users would have their own docker image represent their application, which will be used to start the external worker pool service to handle their own UDF execution.

Does this make sense ?

> On Sep 29, 2021, at 1:09 PM, Luke Cwik <lc...@google.com> wrote:
> 
> That sounds neat. I think that before you try to figure out how to change Beam to fit this usecase is to think about what would be the best way for users to specify these requirements when they are constructing the pipeline. Once you have some samples that you could share the community would probably be able to give you more pointed advice.
> For example will they be running one application with a complicated class loader setup, if so then we could probably do away with multiple environments and try to have DoFn's recognize their specific class loader configuration and replicate it on the SDK harness side.
> 
> Also, for performance reasons users may want to resolve their dependency issues to create a maximally fused graph to limit performance impact due to the encoding/decoding boundaries at the edges of those fused graphs.
> 
> Finally, this could definitely apply to languages like Python and Go (now that Go has support for modules) as dependency issues are a common problem.
> 
> 
> On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
> Thanks for the advice.
> 
> Here are some more background:
> 
> We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.
> 
> With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.
> 
> Therefore, I am exploring approaches and recommendations what are the proper way to do that.
> 
> Let me know your thoughts, any feedback/advice is welcome. 
> 
> Best,
> Ke
> 
>> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>> 
>> Resource hints have a limited use case and might fit your need.
>> You could also try to use the expansion service XLang route to bring in a different Java environment.
>> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
>> 
>> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
>> 
>> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
>> 
>> Best,
>> Ke
>> 
>>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>>> 
>>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>> 
>>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>>> 
>>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144 <https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144>
>>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>>> Hello All,
>>> 
>>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>>> 
>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>>> 
>>> 
>>> Best,
>>> Ke 
>>> 
>>> 
>>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344 <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344> 
>> 
> 


Re: Multi Environment Support

Posted by Luke Cwik <lc...@google.com>.
That sounds neat. I think that before you try to figure out how to change
Beam to fit this usecase is to think about what would be the best way for
users to specify these requirements when they are constructing the
pipeline. Once you have some samples that you could share the community
would probably be able to give you more pointed advice.
For example will they be running one application with a complicated class
loader setup, if so then we could probably do away with multiple
environments and try to have DoFn's recognize their specific class loader
configuration and replicate it on the SDK harness side.

Also, for performance reasons users may want to resolve their dependency
issues to create a maximally fused graph to limit performance impact due to
the encoding/decoding boundaries at the edges of those fused graphs.

Finally, this could definitely apply to languages like Python and Go (now
that Go has support for modules) as dependency issues are a common problem.


On Wed, Sep 29, 2021 at 11:47 AM Ke Wu <ke...@gmail.com> wrote:

> Thanks for the advice.
>
> Here are some more background:
>
> We are building a feature called “split deployment” such that, we can
> isolate framework/platform core from user code/dependencies to address
> couple of operational challenges such as dependency conflict,
> alert/exception triaging.
>
> With Beam’s portability framework, runner and sdk worker process naturally
> decouples beam core and user UDFs(DoFn), which is awesome! On top of this,
> we could further distinguish DoFn(s) that end user authors from DoFn(s)
> that platform provides, therefore, we would like these DoFn(s) to be
> executed in different environments, even in the same language, e.g. Java.
>
> Therefore, I am exploring approaches and recommendations what are the
> proper way to do that.
>
> Let me know your thoughts, any feedback/advice is welcome.
>
> Best,
> Ke
>
> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
>
> Resource hints have a limited use case and might fit your need.
> You could also try to use the expansion service XLang route to bring in a
> different Java environment.
> Finally, you could modify the pipeline proto that is generated directly to
> choose which environment is used for which PTransform.
>
> Can you provide additional details as to why you would want to have two
> separate java environments (e.g. incompatible versions of libraries)?
>
> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:
>
>> Thanks Luke for the reply, do you know what is the preferred way to
>> configure a PTransform to be executed in a different environment from
>> another PTransform when both are in the same SDK, e.g. Java ?
>>
>> Best,
>> Ke
>>
>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>>
>> Environments that aren't exactly the same are already in separate
>> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>>
>> Workarounds like getOnlyEnvironmentId would need to be removed. It may
>> also be effectively dead-code.
>>
>> 1:
>> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>>
>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>>
>>> Hello All,
>>>
>>> We have a use case where in a java portable pipeline, we would like to
>>> have multiple environments setup in order that some executable stage runs
>>> in one environment while some other executable stages runs in another
>>> environment. Couple of questions on this:
>>>
>>> 1. Is this current supported? I noticed a TODO in [1] which suggests it
>>> is feature pending support
>>> 2. If we did support it, what would the ideal mechanism to distinguish
>>> ParDo/ExecutableStage to be executed in different environment, is it
>>> through ResourceHints?
>>>
>>>
>>> Best,
>>> Ke
>>>
>>>
>>> [1]
>>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>>
>>>
>>
>>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Thanks for the advice.

Here are some more background:

We are building a feature called “split deployment” such that, we can isolate framework/platform core from user code/dependencies to address couple of operational challenges such as dependency conflict, alert/exception triaging.

With Beam’s portability framework, runner and sdk worker process naturally decouples beam core and user UDFs(DoFn), which is awesome! On top of this, we could further distinguish DoFn(s) that end user authors from DoFn(s) that platform provides, therefore, we would like these DoFn(s) to be executed in different environments, even in the same language, e.g. Java.

Therefore, I am exploring approaches and recommendations what are the proper way to do that.

Let me know your thoughts, any feedback/advice is welcome. 

Best,
Ke

> On Sep 27, 2021, at 11:56 AM, Luke Cwik <lc...@google.com> wrote:
> 
> Resource hints have a limited use case and might fit your need.
> You could also try to use the expansion service XLang route to bring in a different Java environment.
> Finally, you could modify the pipeline proto that is generated directly to choose which environment is used for which PTransform.
> 
> Can you provide additional details as to why you would want to have two separate java environments (e.g. incompatible versions of libraries)?
> 
> On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
> Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?
> 
> Best,
> Ke
> 
>> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lcwik@google.com <ma...@google.com>> wrote:
>> 
>> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>> 
>> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
>> 
>> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144 <https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144>
>> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
>> Hello All,
>> 
>> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
>> 
>> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
>> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
>> 
>> 
>> Best,
>> Ke 
>> 
>> 
>> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344 <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344> 
> 


Re: Multi Environment Support

Posted by Luke Cwik <lc...@google.com>.
Resource hints have a limited use case and might fit your need.
You could also try to use the expansion service XLang route to bring in a
different Java environment.
Finally, you could modify the pipeline proto that is generated directly to
choose which environment is used for which PTransform.

Can you provide additional details as to why you would want to have two
separate java environments (e.g. incompatible versions of libraries)?

On Wed, Sep 22, 2021 at 3:41 PM Ke Wu <ke...@gmail.com> wrote:

> Thanks Luke for the reply, do you know what is the preferred way to
> configure a PTransform to be executed in a different environment from
> another PTransform when both are in the same SDK, e.g. Java ?
>
> Best,
> Ke
>
> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
>
> Environments that aren't exactly the same are already in separate
> ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
>
> Workarounds like getOnlyEnvironmentId would need to be removed. It may
> also be effectively dead-code.
>
> 1:
> https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144
>
> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:
>
>> Hello All,
>>
>> We have a use case where in a java portable pipeline, we would like to
>> have multiple environments setup in order that some executable stage runs
>> in one environment while some other executable stages runs in another
>> environment. Couple of questions on this:
>>
>> 1. Is this current supported? I noticed a TODO in [1] which suggests it
>> is feature pending support
>> 2. If we did support it, what would the ideal mechanism to distinguish
>> ParDo/ExecutableStage to be executed in different environment, is it
>> through ResourceHints?
>>
>>
>> Best,
>> Ke
>>
>>
>> [1]
>> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>>
>>
>
>

Re: Multi Environment Support

Posted by Ke Wu <ke...@gmail.com>.
Thanks Luke for the reply, do you know what is the preferred way to configure a PTransform to be executed in a different environment from another PTransform when both are in the same SDK, e.g. Java ?

Best,
Ke

> On Sep 21, 2021, at 9:48 PM, Luke Cwik <lc...@google.com> wrote:
> 
> Environments that aren't exactly the same are already in separate ExecutableStages. The GreedyPCollectionFuser ensures that today[1].
> 
> Workarounds like getOnlyEnvironmentId would need to be removed. It may also be effectively dead-code.
> 
> 1: https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144 <https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144>
> On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke.wu.cs@gmail.com <ma...@gmail.com>> wrote:
> Hello All,
> 
> We have a use case where in a java portable pipeline, we would like to have multiple environments setup in order that some executable stage runs in one environment while some other executable stages runs in another environment. Couple of questions on this:
> 
> 1. Is this current supported? I noticed a TODO in [1] which suggests it is feature pending support
> 2. If we did support it, what would the ideal mechanism to distinguish ParDo/ExecutableStage to be executed in different environment, is it through ResourceHints?
> 
> 
> Best,
> Ke 
> 
> 
> [1] https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344 <https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344> 


Re: Multi Environment Support

Posted by Luke Cwik <lc...@google.com>.
Environments that aren't exactly the same are already in separate
ExecutableStages. The GreedyPCollectionFuser ensures that today[1].

Workarounds like getOnlyEnvironmentId would need to be removed. It may also
be effectively dead-code.

1:
https://github.com/apache/beam/blob/ebf2aacf37b97fc85b167271f184f61f5b06ddc3/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/graph/GreedyPCollectionFusers.java#L144

On Tue, Sep 21, 2021 at 1:45 PM Ke Wu <ke...@gmail.com> wrote:

> Hello All,
>
> We have a use case where in a java portable pipeline, we would like to
> have multiple environments setup in order that some executable stage runs
> in one environment while some other executable stages runs in another
> environment. Couple of questions on this:
>
> 1. Is this current supported? I noticed a TODO in [1] which suggests it is
> feature pending support
> 2. If we did support it, what would the ideal mechanism to distinguish
> ParDo/ExecutableStage to be executed in different environment, is it
> through ResourceHints?
>
>
> Best,
> Ke
>
>
> [1]
> https://github.com/apache/beam/blob/master/runners/core-construction-java/src/main/java/org/apache/beam/runners/core/construction/SdkComponents.java#L344
>
>