You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chamikara Jayalath via dev <de...@beam.apache.org> on 2022/11/24 01:09:15 UTC

Re: Easy Multi-language via a SchemaTransform-aware Expansion Service

Hi All,

The implementation of https://s.apache.org/easy-multi-language (with the
dynamic API for Python) was merged and should be available with Beam
2.44.0: https://github.com/apache/beam/pull/23413

Thanks,
Cham

On Fri, Aug 19, 2022 at 3:35 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Hi All,
>
> Thanks for the comments so far. Seems like we generally agree on this
> proposal.
>
> Please see https://github.com/apache/beam/pull/22802 for a prototype
> implementation that adds the following.
>
> * Support for dynamically discovering and registering SchemaTransforms in
> the Java expansion service.
> * Support for dynamically discovering registered SchemaTransforms from the
> Python side.
> * Support for using SchemaTransforms in Python pipelines.
>
> Feel free to add more comments to the doc and/or the PR.
>
> Thanks,
> Cham
>
>
>
>
>
>
>
> On Mon, Aug 8, 2022 at 9:34 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> I think the *DiscoverSchemaTransform()* RPC introduced in this proposal
>> and the ability to easily deploy/use available *SchemaTransforms* using
>> an expansion service essentially provide the tooling necessary for
>> implementing such a service. Such a service could even startup expansion
>> services to discover/list transforms available in given artifacts (for
>> example, jar files).
>>
>> Thanks,
>> Cham
>>
>> On Mon, Aug 8, 2022 at 3:48 PM Byron Ellis <by...@google.com> wrote:
>>
>>> I like that idea, sort of like Kafka’s Schema Service but for transforms?
>>>
>>> On Mon, Aug 8, 2022 at 2:45 PM Robert Bradshaw via dev <
>>> dev@beam.apache.org> wrote:
>>>
>>>> This is a great idea. I would like to approach this from the
>>>> perspective of making it easy to provide a catalog of well-defined
>>>> transforms for use in expansion services from typical SDKs and also
>>>> elsewhere (e.g. for documentation purposes, GUIs, etc.) Ideally
>>>> everything about what a transform is (its config, documentation,
>>>> expectations on inputs, etc.) can be specified programmatically in a
>>>> way that's much easier to both author and consume than it is now.
>>>>
>>>> On Thu, Aug 4, 2022 at 6:51 PM Chamikara Jayalath via dev
>>>> <de...@beam.apache.org> wrote:
>>>> >
>>>> > Hi All,
>>>> >
>>>> > I believe we can make the multi-language pipelines offering [1] much
>>>> easier to use by updating the expansion service to be fully aware of
>>>> SchemaTransforms. Additionally this will make it easy to
>>>> register/discover/use transforms defined in one SDK from all other SDKs.
>>>> Specifically we could add the following features.
>>>> >
>>>> > Expansion service can be used to easily initialize and expand
>>>> transforms without need for additional code.
>>>> > Expansion service can be used to easily discover already registered
>>>> transforms.
>>>> > Pipeline SDKs can generate user-friendly stub-APIs based on
>>>> transforms registered with an expansion service, eliminating the need to
>>>> develop language-specific wrappers.
>>>> >
>>>> > Please see here for my proposal:
>>>> https://s.apache.org/easy-multi-language
>>>> >
>>>> > Lemme know if you have any comments/questions/suggestions :)
>>>> >
>>>> > Thanks,
>>>> > Cham
>>>> >
>>>> > [1]
>>>> https://beam.apache.org/documentation/programming-guide/#multi-language-pipelines
>>>> >
>>>>
>>>