You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Chamikara Jayalath <ch...@google.com> on 2021/07/27 02:09:44 UTC

A simpler way to define and use Java cross-language transforms

Hi All,

Currently, to define Java cross-language transforms, users have to define
three new Java classes: a Registrar, a Builder and a Config Object [1].

While this might not be too hard for a Java programmer, learning Java and
developing/building/releasing new classes just to use existing Java
transforms may be cumbersome for Python and Go users. To further simplify
the process for defining new Java cross-language transforms and usage of
such transforms from other SDKs I would like to propose an update to the
cross-language transform expansion protocol.

Please see the following for details and let me know if you have any
comments.
https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing

Thanks,
Cham

[1]
https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms

Re: A simpler way to define and use Java cross-language transforms

Posted by Jan Lukavský <je...@seznam.cz>.
On 8/17/21 10:40 PM, Luke Cwik wrote:
>
>
> On Tue, Aug 17, 2021 at 1:28 PM Chamikara Jayalath 
> <chamikara@google.com <ma...@google.com>> wrote:
>
>
>
>     On Tue, Aug 17, 2021 at 1:01 PM Luke Cwik <lcwik@google.com
>     <ma...@google.com>> wrote:
>
>         I see the language differences but still feel as though there
>         is a pretty common base that would work for object oriented
>         languages and another for non-object oriented languages.
>
>
>     For now, the only property I think that can be clearly moved to a
>     common base is "class_name". I felt like adding a base just for
>     that is overkill but any suggestions to the PR are welcome :)
>
>
>         Authentication won't provide the right type of protection. For
>         example if GCP hosted an expansion service, any GCP customer
>         should be able to authenticate to use it but that wouldn't
>         mean that GCP would want arbitrary code to be executed. We
>         could have an allowlist of classes and methods that are able
>         to be invoked via this pattern.
>
>
>     Yeah, additional authentication mechanisms can be introduced to
>     make this safer. I think the bottom line is that the ability to
>     invoke Java transforms without introducing new Java code can be
>     appealing to many non-Java cross-language transform users. Also,
>     the proposed solution does not let users execute arbitrary Java
>     code. They are simply invoking classes/methods that are already
>     available in the expansion service in a controlled way. We can
>     introduce additional authentication mechanisms, allowlists etc. to
>     make this even safer.
>
>
>         Finally, this solution does break the abstraction of "hey I
>         want to execute BigQuery read with these parameters" since the
>         current proposal is about how to construct such a transform
>         via some method calls. I believe this will expose more sharp
>         edges around pipeline author and expansion service versioning
>         issues and places the onus onto the pipeline author or
>         expansion service to not break anything.
>
>
>     It does support instantiating a transform using constructor
>     parameters or a constructor method with parameters, and builder
>     methods. For example, BigQueryIO.readTableRows().from(String
>     tableSpec), where  "BigQueryIO" is the class name, "readTableRows"
>     is the constructor method, and "from(String tableSpec)" is a
>     builder method. Did I miss any common pattern in addition to this ?
>
>
> The issue isn't that you support building a BigQueryIO transform for 
> reading, it is that the way that the proto defines the XLang 
> transform is coupled directly to the code in how the transform is 
> built. For example if someone wanted to create a C++ or Python  
> version of the expansion service then that someone would need to 
> translate the Java code directly with all of its methods. The 
> alternative is that every XLang transform has a specification and 
> there is a specification to transform adapter that sits between the 
> specification and the implementation of how that transform is 
> constructed. This level of indirection provides a lot of value related 
> to renames of fields/methods, versioning, implementations in other 
> languages and security.
This is a good point. Sounds like we would like to verify client-server 
API compatibility, would a client's Beam version sent in the request 
solve that? The server could then verify that it has the same version 
(which is the most strict condition, but mostly safe one). It would be 
nice if this information could be used to route the request to a 
specific instance in the case of the hosted expansion service (multiple 
versions hosted behind the same hostname). I'm not that familiar with 
gRPC, but could this be added to the request's metadata somehow? And 
could that information be used for the reverse-proxying?
>
>
>     Thanks,
>     Cham
>
>
>
>         On Tue, Aug 17, 2021 at 12:38 PM Chamikara Jayalath
>         <chamikara@google.com <ma...@google.com>> wrote:
>
>
>
>             On Tue, Aug 17, 2021 at 11:52 AM Luke Cwik
>             <lcwik@google.com <ma...@google.com>> wrote:
>
>                 Thanks, I was able to finally take a look.
>
>                 I totally agree that this would be applicable to any
>                 language so replacing Java specific idioms with
>                 general language concepts but I think the risk is that
>                 no hosted expansion service would want to have support
>                 for unchecked call this method with this parameter
>                 since it is too large a security risk. Code/features
>                 like this is a common reason for CVE's being created.
>
>
>             Actually the discussion regarding generalization of the
>             payload for all languages evolved a bit [1] in the doc.
>
>             I think the invocation patterns of different languages are
>             different enough to warrant different URNs
>             (PayloadTypeUrns) and payload types for instantiating
>             transforms implemented in different languages. For
>             example, Java requires a class name and builder methods.
>             Parameters of the constructor have to be ordered.
>             Python usually uses keyword arguments where the ordering
>             doesn't matter. Go meanwhile does not have a concept of
>             classes.
>
>             Regarding vulnerabilities, I think one solution might be
>             to introduce some sort of an authentication mechanism for
>             ExpansionServices so that expansion requests can be
>             properly authenticated. Currently we only use expansion
>             services that are local processes so I think this can be
>             left out of this proposal but this is something we should
>             add to properly support remote expansion services.
>
>             Thanks,
>             Cham
>
>             [1]
>             https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU
>             <https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU>
>
>
>                 On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath
>                 <chamikara@google.com <ma...@google.com>>
>                 wrote:
>
>                     Thanks for all the comments in the doc.
>
>                     I created [1] for tracking and opened up a pull
>                     request for proto and Java updates:
>                     https://github.com/apache/beam/pull/15343
>                     <https://github.com/apache/beam/pull/15343>
>
>                     Thanks,
>                     Cham
>
>                     [1]
>                     https://issues.apache.org/jira/browse/BEAM-12769
>                     <https://issues.apache.org/jira/browse/BEAM-12769>
>
>                     On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw
>                     <robertwb@google.com <ma...@google.com>>
>                     wrote:
>
>                         On Tue, Jul 27, 2021 at 1:31 PM Chamikara
>                         Jayalath <chamikara@google.com
>                         <ma...@google.com>> wrote:
>                         >
>                         > On Tue, Jul 27, 2021 at 11:03 AM Andrew
>                         Pilloud <apilloud@google.com
>                         <ma...@google.com>> wrote:
>                         >>
>                         >> Hi Cham,
>                         >>
>                         >> Are you aware of the SchemaIO and
>                         SchemaIOProvider interfaces (and their
>                         design)? One of SchemaIO's goal is putting a
>                         generic interface on IOs so users don't have
>                         to construct wrappers for cross language use.
>                         It looks like this new interface can probably
>                         construct a Java SchemaIO, so it sounds
>                         reasonable to me. (This might be something
>                         worth testing when you implement it.)
>                         >>
>                         >> We are starting to add additional
>                         functionality (support for automatic
>                         optimizations, such as filter and project
>                         push-down). I'm not sure how this is going to
>                         work cross language yet, but we will probably
>                         end up adding metadata needed to reconstruct
>                         the transform to the portability proto.
>                         >
>                         > Went through it a bit and I think the two
>                         designs are complementary. Schema aware IO
>                         will allow some I/O transform authors to allow
>                         easily accessing transforms from a remote SDK
>                         using a SQL query while the current proposal
>                         makes defining/using Java transforms easier
>                         for non-Java programmers. I think both
>                         proposals will help reduce the barrier to
>                         entry for cross-language and will help make
>                         more Java transforms available to other SDKs.
>
>                         SQL benefits from being able to declare an IO
>                         in textual form.
>                         Cross-language seeks to establish a standard
>                         to describe an IO in a
>                         language-agnostic form. At their core is the
>                         desire to be able to
>                         instantiate an IO based on a name (which is
>                         likely linked to an
>                         implementation via a registrar) and a set of
>                         named parameters of
>                         "basic" type. I would hope that any IO that
>                         offers a generic SchemaIO
>                         interface will be trivially wrappable as an
>                         external transform.
>
>                         I do agree, however, that this external
>                         transform is more general than
>                         just IOs and transforms accepting/providing
>                         Row types.
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Robert Bradshaw <ro...@google.com>.
On Tue, Aug 17, 2021 at 1:28 PM Chamikara Jayalath <ch...@google.com> wrote:
>
> On Tue, Aug 17, 2021 at 1:01 PM Luke Cwik <lc...@google.com> wrote:
>>
>> I see the language differences but still feel as though there is a pretty common base that would work for object oriented languages and another for non-object oriented languages.
>
> For now, the only property I think that can be clearly moved to a common base is "class_name". I felt like adding a base just for that is overkill but any suggestions to the PR are welcome :)

Would there be any advantages in having such a common base? Even
though Python offers OO, the natural API is not class-based, but
rather to provide the fully qualified name of a callable + keyword
arguments.

>> Authentication won't provide the right type of protection. For example if GCP hosted an expansion service, any GCP customer should be able to authenticate to use it but that wouldn't mean that GCP would want arbitrary code to be executed. We could have an allowlist of classes and methods that are able to be invoked via this pattern.
>
> Yeah, additional authentication mechanisms can be introduced to make this safer. I think the bottom line is that the ability to invoke Java transforms without introducing new Java code can be appealing to many non-Java cross-language transform users. Also, the proposed solution does not let users execute arbitrary Java code. They are simply invoking classes/methods that are already available in the expansion service in a controlled way. We can introduce additional authentication mechanisms, allowlists etc. to make this even safer.

Well, the user could invoke "java.lang.Runtime.exec" as if it were a
transform constructor. A hosted expansion service would likely want to
do sandboxing/allowlisting, but for now the goal is to easily expose
arbitrary Java transforms to a non-Java user.

>> Finally, this solution does break the abstraction of "hey I want to execute BigQuery read with these parameters" since the current proposal is about how to construct such a transform via some method calls. I believe this will expose more sharp edges around pipeline author and expansion service versioning issues and places the onus onto the pipeline author or expansion service to not break anything.
>
>
> It does support instantiating a transform using constructor parameters or a constructor method with parameters, and builder methods. For example, BigQueryIO.readTableRows().from(String tableSpec), where  "BigQueryIO" is the class name, "readTableRows" is the constructor method, and "from(String tableSpec)" is a builder method. Did I miss any common pattern in addition to this ?
>
> Thanks,
> Cham
>
>>
>>
>>
>> On Tue, Aug 17, 2021 at 12:38 PM Chamikara Jayalath <ch...@google.com> wrote:
>>>
>>>
>>>
>>> On Tue, Aug 17, 2021 at 11:52 AM Luke Cwik <lc...@google.com> wrote:
>>>>
>>>> Thanks, I was able to finally take a look.
>>>>
>>>> I totally agree that this would be applicable to any language so replacing Java specific idioms with general language concepts but I think the risk is that no hosted expansion service would want to have support for unchecked call this method with this parameter since it is too large a security risk. Code/features like this is a common reason for CVE's being created.
>>>
>>>
>>> Actually the discussion regarding generalization of the payload for all languages evolved a bit [1] in the doc.
>>>
>>> I think the invocation patterns of different languages are different enough to warrant different URNs (PayloadTypeUrns) and payload types for instantiating transforms implemented in different languages. For example, Java requires a class name and builder methods. Parameters of the constructor have to be ordered. Python usually uses keyword arguments where the ordering doesn't matter. Go meanwhile does not have a concept of classes.
>>>
>>> Regarding vulnerabilities, I think one solution might be to introduce some sort of an authentication mechanism for ExpansionServices so that expansion requests can be properly authenticated. Currently we only use expansion services that are local processes so I think this can be left out of this proposal but this is something we should add to properly support remote expansion services.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1] https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU
>>>
>>>>
>>>>
>>>> On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath <ch...@google.com> wrote:
>>>>>
>>>>> Thanks for all the comments in the doc.
>>>>>
>>>>> I created [1] for tracking and opened up a pull request for proto and Java updates: https://github.com/apache/beam/pull/15343
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/BEAM-12769
>>>>>
>>>>> On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com> wrote:
>>>>>>
>>>>>> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com> wrote:
>>>>>> >
>>>>>> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com> wrote:
>>>>>> >>
>>>>>> >> Hi Cham,
>>>>>> >>
>>>>>> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and their design)? One of SchemaIO's goal is putting a generic interface on IOs so users don't have to construct wrappers for cross language use. It looks like this new interface can probably construct a Java SchemaIO, so it sounds reasonable to me. (This might be something worth testing when you implement it.)
>>>>>> >>
>>>>>> >> We are starting to add additional functionality (support for automatic optimizations, such as filter and project push-down). I'm not sure how this is going to work cross language yet, but we will probably end up adding metadata needed to reconstruct the transform to the portability proto.
>>>>>> >
>>>>>> > Went through it a bit and I think the two designs are complementary. Schema aware IO will allow some I/O transform authors to allow easily accessing transforms from a remote SDK using a SQL query while the current proposal makes defining/using Java transforms easier for non-Java programmers. I think both proposals will help reduce the barrier to entry for cross-language and will help make more Java transforms available to other SDKs.
>>>>>>
>>>>>> SQL benefits from being able to declare an IO in textual form.
>>>>>> Cross-language seeks to establish a standard to describe an IO in a
>>>>>> language-agnostic form. At their core is the desire to be able to
>>>>>> instantiate an IO based on a name (which is likely linked to an
>>>>>> implementation via a registrar) and a set of named parameters of
>>>>>> "basic" type. I would hope that any IO that offers a generic SchemaIO
>>>>>> interface will be trivially wrappable as an external transform.
>>>>>>
>>>>>> I do agree, however, that this external transform is more general than
>>>>>> just IOs and transforms accepting/providing Row types.

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Aug 17, 2021 at 2:04 PM Robert Bradshaw <ro...@google.com> wrote:

> On Tue, Aug 17, 2021 at 1:40 PM Luke Cwik <lc...@google.com> wrote:
> >
> > On Tue, Aug 17, 2021 at 1:28 PM Chamikara Jayalath <ch...@google.com>
> wrote:
> >>
> >> On Tue, Aug 17, 2021 at 1:01 PM Luke Cwik <lc...@google.com> wrote:
> >>>
> >>> Finally, this solution does break the abstraction of "hey I want to
> execute BigQuery read with these parameters" since the current proposal is
> about how to construct such a transform via some method calls. I believe
> this will expose more sharp edges around pipeline author and expansion
> service versioning issues and places the onus onto the pipeline author or
> expansion service to not break anything.
> >>
> >> It does support instantiating a transform using constructor parameters
> or a constructor method with parameters, and builder methods. For example,
> BigQueryIO.readTableRows().from(String tableSpec), where  "BigQueryIO" is
> the class name, "readTableRows" is the constructor method, and "from(String
> tableSpec)" is a builder method. Did I miss any common pattern in addition
> to this ?
> >
> > The issue isn't that you support building a BigQueryIO transform for
> reading, it is that the way that the proto defines the XLang transform is
> coupled directly to the code in how the transform is built. For example if
> someone wanted to create a C++ or Python  version of the expansion service
> then that someone would need to translate the Java code directly with all
> of its methods. The alternative is that every XLang transform has a
> specification and there is a specification to transform adapter that sits
> between the specification and the implementation of how that transform is
> constructed. This level of indirection provides a lot of value related to
> renames of fields/methods, versioning, implementations in other languages
> and security.
>
> I think this hits at the crux of the issue. There are two related, but
> not quite similar problems, that users may be trying to solve.
>
> (1) I am a Transform author, and want to make my transform available
> to users of all languages
> (2) I am a Transform user, and want to use a transform only available
> in language X.
>
> There is also the usecase of
>
> (3) I am a platform provider, and wish to provide alternative
> implementations of specific transforms.
>
> For (3), I think the solution is enriching more of our composites with
> URNs and payloads that logically describe their intent, which can be
> swapped out in the runner as an optimization phase, is the best way
> forward. This is tangential to external transform expansion (except,
> possibly, if the external definition is preserved and sufficient to be
> this logical description, though one would generally like to have all
> the same data whether it was called via the external expansion API or
> inline in the language itself.)
>
> Forcing people trying to do (2) have to solve (1), especially for a
> long tail of connectors, is what this proposal tries to solve. It also
> somewhat addresses (1), as the current mechanism for exposing these
> things is fairly verbose (e.g. a pipeline author provides one
> (hopefully carefully designed) API for Java, and then needs to provide
> another for "everyone else."
>
> (I'll also note that this API, while subject to change, is just as
> stable as the native API is for users of the native SDK.)
>
>
I think having correct abstractions in SDKs that use cross-language
transforms defined this way will make the experience more seamless for
remote SDK users.
Currently my plan is to introduce an abstraction (for example, in Python)
where the pipeline author will have to directly specify the class name,
builder methods, etc.
If the Java API changes, corresponding Python pipelines will have to change
(just like any Java pipelines that use the same transform API will have to
change). We have to make sure that we do not expose any private
constructors, private methods, etc. through the API (even though Java
reflection supports that) and we have to make sure that any invocation
errors propagated back to the pipeline SDK.

In the future, it might make sense to introduce a way to discover the API
of a transform via a RPC and add support for generating a stub class in
Python/Go side that mimics the updated Java transform.

Thanks,
Cham



>
> I will say that being able to define a PTransform as an identifier + a
> set of named parameters would be ideal for other reasons as well (e.g.
> GUI builders come to mind). The fact that Java encourages the builder
> pattern is what makes these manual bridges necessary. Another approach
> that would be interesting to take is if one could decorate a Java
> transform that follows the "conventional" builder pattern with an
> annotation that would make the identifier + properties to the
> appropriate constructor + setters automatically.
>
> - Robert
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Robert Bradshaw <ro...@google.com>.
On Tue, Aug 17, 2021 at 1:40 PM Luke Cwik <lc...@google.com> wrote:
>
> On Tue, Aug 17, 2021 at 1:28 PM Chamikara Jayalath <ch...@google.com> wrote:
>>
>> On Tue, Aug 17, 2021 at 1:01 PM Luke Cwik <lc...@google.com> wrote:
>>>
>>> Finally, this solution does break the abstraction of "hey I want to execute BigQuery read with these parameters" since the current proposal is about how to construct such a transform via some method calls. I believe this will expose more sharp edges around pipeline author and expansion service versioning issues and places the onus onto the pipeline author or expansion service to not break anything.
>>
>> It does support instantiating a transform using constructor parameters or a constructor method with parameters, and builder methods. For example, BigQueryIO.readTableRows().from(String tableSpec), where  "BigQueryIO" is the class name, "readTableRows" is the constructor method, and "from(String tableSpec)" is a builder method. Did I miss any common pattern in addition to this ?
>
> The issue isn't that you support building a BigQueryIO transform for reading, it is that the way that the proto defines the XLang transform is coupled directly to the code in how the transform is built. For example if someone wanted to create a C++ or Python  version of the expansion service then that someone would need to translate the Java code directly with all of its methods. The alternative is that every XLang transform has a specification and there is a specification to transform adapter that sits between the specification and the implementation of how that transform is constructed. This level of indirection provides a lot of value related to renames of fields/methods, versioning, implementations in other languages and security.

I think this hits at the crux of the issue. There are two related, but
not quite similar problems, that users may be trying to solve.

(1) I am a Transform author, and want to make my transform available
to users of all languages
(2) I am a Transform user, and want to use a transform only available
in language X.

There is also the usecase of

(3) I am a platform provider, and wish to provide alternative
implementations of specific transforms.

For (3), I think the solution is enriching more of our composites with
URNs and payloads that logically describe their intent, which can be
swapped out in the runner as an optimization phase, is the best way
forward. This is tangential to external transform expansion (except,
possibly, if the external definition is preserved and sufficient to be
this logical description, though one would generally like to have all
the same data whether it was called via the external expansion API or
inline in the language itself.)

Forcing people trying to do (2) have to solve (1), especially for a
long tail of connectors, is what this proposal tries to solve. It also
somewhat addresses (1), as the current mechanism for exposing these
things is fairly verbose (e.g. a pipeline author provides one
(hopefully carefully designed) API for Java, and then needs to provide
another for "everyone else."

(I'll also note that this API, while subject to change, is just as
stable as the native API is for users of the native SDK.)


I will say that being able to define a PTransform as an identifier + a
set of named parameters would be ideal for other reasons as well (e.g.
GUI builders come to mind). The fact that Java encourages the builder
pattern is what makes these manual bridges necessary. Another approach
that would be interesting to take is if one could decorate a Java
transform that follows the "conventional" builder pattern with an
annotation that would make the identifier + properties to the
appropriate constructor + setters automatically.

- Robert

Re: A simpler way to define and use Java cross-language transforms

Posted by Luke Cwik <lc...@google.com>.
On Tue, Aug 17, 2021 at 1:28 PM Chamikara Jayalath <ch...@google.com>
wrote:

>
>
> On Tue, Aug 17, 2021 at 1:01 PM Luke Cwik <lc...@google.com> wrote:
>
>> I see the language differences but still feel as though there is a pretty
>> common base that would work for object oriented languages and another for
>> non-object oriented languages.
>>
>
> For now, the only property I think that can be clearly moved to a common
> base is "class_name". I felt like adding a base just for that is overkill
> but any suggestions to the PR are welcome :)
>

>
>>
>> Authentication won't provide the right type of protection. For example if
>> GCP hosted an expansion service, any GCP customer should be able to
>> authenticate to use it but that wouldn't mean that GCP would want arbitrary
>> code to be executed. We could have an allowlist of classes and methods that
>> are able to be invoked via this pattern.
>>
>
> Yeah, additional authentication mechanisms can be introduced to make this
> safer. I think the bottom line is that the ability to invoke Java
> transforms without introducing new Java code can be appealing to many
> non-Java cross-language transform users. Also, the proposed solution does
> not let users execute arbitrary Java code. They are simply invoking
> classes/methods that are already available in the expansion service in a
> controlled way. We can introduce additional authentication mechanisms,
> allowlists etc. to make this even safer.
>
>
>>
>> Finally, this solution does break the abstraction of "hey I want to
>> execute BigQuery read with these parameters" since the current proposal is
>> about how to construct such a transform via some method calls. I believe
>> this will expose more sharp edges around pipeline author and expansion
>> service versioning issues and places the onus onto the pipeline author or
>> expansion service to not break anything.
>>
>
> It does support instantiating a transform using constructor parameters or
> a constructor method with parameters, and builder methods. For example,
> BigQueryIO.readTableRows().from(String tableSpec), where  "BigQueryIO" is
> the class name, "readTableRows" is the constructor method, and "from(String
> tableSpec)" is a builder method. Did I miss any common pattern in addition
> to this ?
>

The issue isn't that you support building a BigQueryIO transform for
reading, it is that the way that the proto defines the XLang transform is
coupled directly to the code in how the transform is built. For example if
someone wanted to create a C++ or Python  version of the expansion service
then that someone would need to translate the Java code directly with all
of its methods. The alternative is that every XLang transform has a
specification and there is a specification to transform adapter that sits
between the specification and the implementation of how that transform is
constructed. This level of indirection provides a lot of value related to
renames of fields/methods, versioning, implementations in other languages
and security.


>
> Thanks,
> Cham
>
>
>>
>>
>> On Tue, Aug 17, 2021 at 12:38 PM Chamikara Jayalath <ch...@google.com>
>> wrote:
>>
>>>
>>>
>>> On Tue, Aug 17, 2021 at 11:52 AM Luke Cwik <lc...@google.com> wrote:
>>>
>>>> Thanks, I was able to finally take a look.
>>>>
>>>> I totally agree that this would be applicable to any language so
>>>> replacing Java specific idioms with general language concepts but I think
>>>> the risk is that no hosted expansion service would want to have support for
>>>> unchecked call this method with this parameter since it is too large a
>>>> security risk. Code/features like this is a common reason for CVE's being
>>>> created.
>>>>
>>>
>>> Actually the discussion regarding generalization of the payload for all
>>> languages evolved a bit [1] in the doc.
>>>
>>> I think the invocation patterns of different languages are different
>>> enough to warrant different URNs (PayloadTypeUrns) and payload types for
>>> instantiating transforms implemented in different languages. For example,
>>> Java requires a class name and builder methods. Parameters of the
>>> constructor have to be ordered. Python usually uses keyword arguments where
>>> the ordering doesn't matter. Go meanwhile does not have a concept of
>>> classes.
>>>
>>> Regarding vulnerabilities, I think one solution might be to introduce
>>> some sort of an authentication mechanism for ExpansionServices so that
>>> expansion requests can be properly authenticated. Currently we only use
>>> expansion services that are local processes so I think this can be left out
>>> of this proposal but this is something we should add to properly support
>>> remote expansion services.
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU
>>>
>>>
>>>>
>>>> On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath <
>>>> chamikara@google.com> wrote:
>>>>
>>>>> Thanks for all the comments in the doc.
>>>>>
>>>>> I created [1] for tracking and opened up a pull request for proto and
>>>>> Java updates: https://github.com/apache/beam/pull/15343
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> [1] https://issues.apache.org/jira/browse/BEAM-12769
>>>>>
>>>>> On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <
>>>>>> chamikara@google.com> wrote:
>>>>>> >
>>>>>> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <
>>>>>> apilloud@google.com> wrote:
>>>>>> >>
>>>>>> >> Hi Cham,
>>>>>> >>
>>>>>> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
>>>>>> their design)? One of SchemaIO's goal is putting a generic interface on IOs
>>>>>> so users don't have to construct wrappers for cross language use. It looks
>>>>>> like this new interface can probably construct a Java SchemaIO, so it
>>>>>> sounds reasonable to me. (This might be something worth testing when you
>>>>>> implement it.)
>>>>>> >>
>>>>>> >> We are starting to add additional functionality (support for
>>>>>> automatic optimizations, such as filter and project push-down). I'm not
>>>>>> sure how this is going to work cross language yet, but we will probably end
>>>>>> up adding metadata needed to reconstruct the transform to the portability
>>>>>> proto.
>>>>>> >
>>>>>> > Went through it a bit and I think the two designs are
>>>>>> complementary. Schema aware IO will allow some I/O transform authors to
>>>>>> allow easily accessing transforms from a remote SDK using a SQL query while
>>>>>> the current proposal makes defining/using Java transforms easier for
>>>>>> non-Java programmers. I think both proposals will help reduce the barrier
>>>>>> to entry for cross-language and will help make more Java transforms
>>>>>> available to other SDKs.
>>>>>>
>>>>>> SQL benefits from being able to declare an IO in textual form.
>>>>>> Cross-language seeks to establish a standard to describe an IO in a
>>>>>> language-agnostic form. At their core is the desire to be able to
>>>>>> instantiate an IO based on a name (which is likely linked to an
>>>>>> implementation via a registrar) and a set of named parameters of
>>>>>> "basic" type. I would hope that any IO that offers a generic SchemaIO
>>>>>> interface will be trivially wrappable as an external transform.
>>>>>>
>>>>>> I do agree, however, that this external transform is more general than
>>>>>> just IOs and transforms accepting/providing Row types.
>>>>>>
>>>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Aug 17, 2021 at 1:01 PM Luke Cwik <lc...@google.com> wrote:

> I see the language differences but still feel as though there is a pretty
> common base that would work for object oriented languages and another for
> non-object oriented languages.
>

For now, the only property I think that can be clearly moved to a common
base is "class_name". I felt like adding a base just for that is overkill
but any suggestions to the PR are welcome :)


>
> Authentication won't provide the right type of protection. For example if
> GCP hosted an expansion service, any GCP customer should be able to
> authenticate to use it but that wouldn't mean that GCP would want arbitrary
> code to be executed. We could have an allowlist of classes and methods that
> are able to be invoked via this pattern.
>

Yeah, additional authentication mechanisms can be introduced to make this
safer. I think the bottom line is that the ability to invoke Java
transforms without introducing new Java code can be appealing to many
non-Java cross-language transform users. Also, the proposed solution does
not let users execute arbitrary Java code. They are simply invoking
classes/methods that are already available in the expansion service in a
controlled way. We can introduce additional authentication mechanisms,
allowlists etc. to make this even safer.


>
> Finally, this solution does break the abstraction of "hey I want to
> execute BigQuery read with these parameters" since the current proposal is
> about how to construct such a transform via some method calls. I believe
> this will expose more sharp edges around pipeline author and expansion
> service versioning issues and places the onus onto the pipeline author or
> expansion service to not break anything.
>

It does support instantiating a transform using constructor parameters or a
constructor method with parameters, and builder methods. For example,
BigQueryIO.readTableRows().from(String tableSpec), where  "BigQueryIO" is
the class name, "readTableRows" is the constructor method, and "from(String
tableSpec)" is a builder method. Did I miss any common pattern in addition
to this ?

Thanks,
Cham


>
>
> On Tue, Aug 17, 2021 at 12:38 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>>
>>
>> On Tue, Aug 17, 2021 at 11:52 AM Luke Cwik <lc...@google.com> wrote:
>>
>>> Thanks, I was able to finally take a look.
>>>
>>> I totally agree that this would be applicable to any language so
>>> replacing Java specific idioms with general language concepts but I think
>>> the risk is that no hosted expansion service would want to have support for
>>> unchecked call this method with this parameter since it is too large a
>>> security risk. Code/features like this is a common reason for CVE's being
>>> created.
>>>
>>
>> Actually the discussion regarding generalization of the payload for all
>> languages evolved a bit [1] in the doc.
>>
>> I think the invocation patterns of different languages are different
>> enough to warrant different URNs (PayloadTypeUrns) and payload types for
>> instantiating transforms implemented in different languages. For example,
>> Java requires a class name and builder methods. Parameters of the
>> constructor have to be ordered. Python usually uses keyword arguments where
>> the ordering doesn't matter. Go meanwhile does not have a concept of
>> classes.
>>
>> Regarding vulnerabilities, I think one solution might be to introduce
>> some sort of an authentication mechanism for ExpansionServices so that
>> expansion requests can be properly authenticated. Currently we only use
>> expansion services that are local processes so I think this can be left out
>> of this proposal but this is something we should add to properly support
>> remote expansion services.
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU
>>
>>
>>>
>>> On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath <
>>> chamikara@google.com> wrote:
>>>
>>>> Thanks for all the comments in the doc.
>>>>
>>>> I created [1] for tracking and opened up a pull request for proto and
>>>> Java updates: https://github.com/apache/beam/pull/15343
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> [1] https://issues.apache.org/jira/browse/BEAM-12769
>>>>
>>>> On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <
>>>>> chamikara@google.com> wrote:
>>>>> >
>>>>> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
>>>>> wrote:
>>>>> >>
>>>>> >> Hi Cham,
>>>>> >>
>>>>> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
>>>>> their design)? One of SchemaIO's goal is putting a generic interface on IOs
>>>>> so users don't have to construct wrappers for cross language use. It looks
>>>>> like this new interface can probably construct a Java SchemaIO, so it
>>>>> sounds reasonable to me. (This might be something worth testing when you
>>>>> implement it.)
>>>>> >>
>>>>> >> We are starting to add additional functionality (support for
>>>>> automatic optimizations, such as filter and project push-down). I'm not
>>>>> sure how this is going to work cross language yet, but we will probably end
>>>>> up adding metadata needed to reconstruct the transform to the portability
>>>>> proto.
>>>>> >
>>>>> > Went through it a bit and I think the two designs are complementary.
>>>>> Schema aware IO will allow some I/O transform authors to allow easily
>>>>> accessing transforms from a remote SDK using a SQL query while the current
>>>>> proposal makes defining/using Java transforms easier for non-Java
>>>>> programmers. I think both proposals will help reduce the barrier to entry
>>>>> for cross-language and will help make more Java transforms available to
>>>>> other SDKs.
>>>>>
>>>>> SQL benefits from being able to declare an IO in textual form.
>>>>> Cross-language seeks to establish a standard to describe an IO in a
>>>>> language-agnostic form. At their core is the desire to be able to
>>>>> instantiate an IO based on a name (which is likely linked to an
>>>>> implementation via a registrar) and a set of named parameters of
>>>>> "basic" type. I would hope that any IO that offers a generic SchemaIO
>>>>> interface will be trivially wrappable as an external transform.
>>>>>
>>>>> I do agree, however, that this external transform is more general than
>>>>> just IOs and transforms accepting/providing Row types.
>>>>>
>>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Luke Cwik <lc...@google.com>.
I see the language differences but still feel as though there is a pretty
common base that would work for object oriented languages and another for
non-object oriented languages.

Authentication won't provide the right type of protection. For example if
GCP hosted an expansion service, any GCP customer should be able to
authenticate to use it but that wouldn't mean that GCP would want arbitrary
code to be executed. We could have an allowlist of classes and methods that
are able to be invoked via this pattern.

Finally, this solution does break the abstraction of "hey I want to execute
BigQuery read with these parameters" since the current proposal is about
how to construct such a transform via some method calls. I believe this
will expose more sharp edges around pipeline author and expansion service
versioning issues and places the onus onto the pipeline author or expansion
service to not break anything.


On Tue, Aug 17, 2021 at 12:38 PM Chamikara Jayalath <ch...@google.com>
wrote:

>
>
> On Tue, Aug 17, 2021 at 11:52 AM Luke Cwik <lc...@google.com> wrote:
>
>> Thanks, I was able to finally take a look.
>>
>> I totally agree that this would be applicable to any language so
>> replacing Java specific idioms with general language concepts but I think
>> the risk is that no hosted expansion service would want to have support for
>> unchecked call this method with this parameter since it is too large a
>> security risk. Code/features like this is a common reason for CVE's being
>> created.
>>
>
> Actually the discussion regarding generalization of the payload for all
> languages evolved a bit [1] in the doc.
>
> I think the invocation patterns of different languages are different
> enough to warrant different URNs (PayloadTypeUrns) and payload types for
> instantiating transforms implemented in different languages. For example,
> Java requires a class name and builder methods. Parameters of the
> constructor have to be ordered. Python usually uses keyword arguments where
> the ordering doesn't matter. Go meanwhile does not have a concept of
> classes.
>
> Regarding vulnerabilities, I think one solution might be to introduce some
> sort of an authentication mechanism for ExpansionServices so that expansion
> requests can be properly authenticated. Currently we only use expansion
> services that are local processes so I think this can be left out of this
> proposal but this is something we should add to properly support remote
> expansion services.
>
> Thanks,
> Cham
>
> [1]
> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU
>
>
>>
>> On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath <ch...@google.com>
>> wrote:
>>
>>> Thanks for all the comments in the doc.
>>>
>>> I created [1] for tracking and opened up a pull request for proto and
>>> Java updates: https://github.com/apache/beam/pull/15343
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1] https://issues.apache.org/jira/browse/BEAM-12769
>>>
>>> On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <
>>>> chamikara@google.com> wrote:
>>>> >
>>>> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
>>>> wrote:
>>>> >>
>>>> >> Hi Cham,
>>>> >>
>>>> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
>>>> their design)? One of SchemaIO's goal is putting a generic interface on IOs
>>>> so users don't have to construct wrappers for cross language use. It looks
>>>> like this new interface can probably construct a Java SchemaIO, so it
>>>> sounds reasonable to me. (This might be something worth testing when you
>>>> implement it.)
>>>> >>
>>>> >> We are starting to add additional functionality (support for
>>>> automatic optimizations, such as filter and project push-down). I'm not
>>>> sure how this is going to work cross language yet, but we will probably end
>>>> up adding metadata needed to reconstruct the transform to the portability
>>>> proto.
>>>> >
>>>> > Went through it a bit and I think the two designs are complementary.
>>>> Schema aware IO will allow some I/O transform authors to allow easily
>>>> accessing transforms from a remote SDK using a SQL query while the current
>>>> proposal makes defining/using Java transforms easier for non-Java
>>>> programmers. I think both proposals will help reduce the barrier to entry
>>>> for cross-language and will help make more Java transforms available to
>>>> other SDKs.
>>>>
>>>> SQL benefits from being able to declare an IO in textual form.
>>>> Cross-language seeks to establish a standard to describe an IO in a
>>>> language-agnostic form. At their core is the desire to be able to
>>>> instantiate an IO based on a name (which is likely linked to an
>>>> implementation via a registrar) and a set of named parameters of
>>>> "basic" type. I would hope that any IO that offers a generic SchemaIO
>>>> interface will be trivially wrappable as an external transform.
>>>>
>>>> I do agree, however, that this external transform is more general than
>>>> just IOs and transforms accepting/providing Row types.
>>>>
>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Aug 17, 2021 at 11:52 AM Luke Cwik <lc...@google.com> wrote:

> Thanks, I was able to finally take a look.
>
> I totally agree that this would be applicable to any language so replacing
> Java specific idioms with general language concepts but I think the risk is
> that no hosted expansion service would want to have support for unchecked
> call this method with this parameter since it is too large a security risk.
> Code/features like this is a common reason for CVE's being created.
>

Actually the discussion regarding generalization of the payload for all
languages evolved a bit [1] in the doc.

I think the invocation patterns of different languages are different enough
to warrant different URNs (PayloadTypeUrns) and payload types for
instantiating transforms implemented in different languages. For example,
Java requires a class name and builder methods. Parameters of the
constructor have to be ordered. Python usually uses keyword arguments where
the ordering doesn't matter. Go meanwhile does not have a concept of
classes.

Regarding vulnerabilities, I think one solution might be to introduce some
sort of an authentication mechanism for ExpansionServices so that expansion
requests can be properly authenticated. Currently we only use expansion
services that are local processes so I think this can be left out of this
proposal but this is something we should add to properly support remote
expansion services.

Thanks,
Cham

[1]
https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?disco=AAAANjeR9CU


>
> On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Thanks for all the comments in the doc.
>>
>> I created [1] for tracking and opened up a pull request for proto and
>> Java updates: https://github.com/apache/beam/pull/15343
>>
>> Thanks,
>> Cham
>>
>> [1] https://issues.apache.org/jira/browse/BEAM-12769
>>
>> On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com>
>>> wrote:
>>> >
>>> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
>>> wrote:
>>> >>
>>> >> Hi Cham,
>>> >>
>>> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
>>> their design)? One of SchemaIO's goal is putting a generic interface on IOs
>>> so users don't have to construct wrappers for cross language use. It looks
>>> like this new interface can probably construct a Java SchemaIO, so it
>>> sounds reasonable to me. (This might be something worth testing when you
>>> implement it.)
>>> >>
>>> >> We are starting to add additional functionality (support for
>>> automatic optimizations, such as filter and project push-down). I'm not
>>> sure how this is going to work cross language yet, but we will probably end
>>> up adding metadata needed to reconstruct the transform to the portability
>>> proto.
>>> >
>>> > Went through it a bit and I think the two designs are complementary.
>>> Schema aware IO will allow some I/O transform authors to allow easily
>>> accessing transforms from a remote SDK using a SQL query while the current
>>> proposal makes defining/using Java transforms easier for non-Java
>>> programmers. I think both proposals will help reduce the barrier to entry
>>> for cross-language and will help make more Java transforms available to
>>> other SDKs.
>>>
>>> SQL benefits from being able to declare an IO in textual form.
>>> Cross-language seeks to establish a standard to describe an IO in a
>>> language-agnostic form. At their core is the desire to be able to
>>> instantiate an IO based on a name (which is likely linked to an
>>> implementation via a registrar) and a set of named parameters of
>>> "basic" type. I would hope that any IO that offers a generic SchemaIO
>>> interface will be trivially wrappable as an external transform.
>>>
>>> I do agree, however, that this external transform is more general than
>>> just IOs and transforms accepting/providing Row types.
>>>
>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Luke Cwik <lc...@google.com>.
Thanks, I was able to finally take a look.

I totally agree that this would be applicable to any language so replacing
Java specific idioms with general language concepts but I think the risk is
that no hosted expansion service would want to have support for unchecked
call this method with this parameter since it is too large a security risk.
Code/features like this is a common reason for CVE's being created.

On Tue, Aug 17, 2021 at 11:03 AM Chamikara Jayalath <ch...@google.com>
wrote:

> Thanks for all the comments in the doc.
>
> I created [1] for tracking and opened up a pull request for proto and Java
> updates: https://github.com/apache/beam/pull/15343
>
> Thanks,
> Cham
>
> [1] https://issues.apache.org/jira/browse/BEAM-12769
>
> On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com>
>> wrote:
>> >
>> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
>> wrote:
>> >>
>> >> Hi Cham,
>> >>
>> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
>> their design)? One of SchemaIO's goal is putting a generic interface on IOs
>> so users don't have to construct wrappers for cross language use. It looks
>> like this new interface can probably construct a Java SchemaIO, so it
>> sounds reasonable to me. (This might be something worth testing when you
>> implement it.)
>> >>
>> >> We are starting to add additional functionality (support for automatic
>> optimizations, such as filter and project push-down). I'm not sure how this
>> is going to work cross language yet, but we will probably end up adding
>> metadata needed to reconstruct the transform to the portability proto.
>> >
>> > Went through it a bit and I think the two designs are complementary.
>> Schema aware IO will allow some I/O transform authors to allow easily
>> accessing transforms from a remote SDK using a SQL query while the current
>> proposal makes defining/using Java transforms easier for non-Java
>> programmers. I think both proposals will help reduce the barrier to entry
>> for cross-language and will help make more Java transforms available to
>> other SDKs.
>>
>> SQL benefits from being able to declare an IO in textual form.
>> Cross-language seeks to establish a standard to describe an IO in a
>> language-agnostic form. At their core is the desire to be able to
>> instantiate an IO based on a name (which is likely linked to an
>> implementation via a registrar) and a set of named parameters of
>> "basic" type. I would hope that any IO that offers a generic SchemaIO
>> interface will be trivially wrappable as an external transform.
>>
>> I do agree, however, that this external transform is more general than
>> just IOs and transforms accepting/providing Row types.
>>
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
Thanks for all the comments in the doc.

I created [1] for tracking and opened up a pull request for proto and Java
updates: https://github.com/apache/beam/pull/15343

Thanks,
Cham

[1] https://issues.apache.org/jira/browse/BEAM-12769

On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com> wrote:

> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com>
> wrote:
> >
> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
> wrote:
> >>
> >> Hi Cham,
> >>
> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
> their design)? One of SchemaIO's goal is putting a generic interface on IOs
> so users don't have to construct wrappers for cross language use. It looks
> like this new interface can probably construct a Java SchemaIO, so it
> sounds reasonable to me. (This might be something worth testing when you
> implement it.)
> >>
> >> We are starting to add additional functionality (support for automatic
> optimizations, such as filter and project push-down). I'm not sure how this
> is going to work cross language yet, but we will probably end up adding
> metadata needed to reconstruct the transform to the portability proto.
> >
> > Went through it a bit and I think the two designs are complementary.
> Schema aware IO will allow some I/O transform authors to allow easily
> accessing transforms from a remote SDK using a SQL query while the current
> proposal makes defining/using Java transforms easier for non-Java
> programmers. I think both proposals will help reduce the barrier to entry
> for cross-language and will help make more Java transforms available to
> other SDKs.
>
> SQL benefits from being able to declare an IO in textual form.
> Cross-language seeks to establish a standard to describe an IO in a
> language-agnostic form. At their core is the desire to be able to
> instantiate an IO based on a name (which is likely linked to an
> implementation via a registrar) and a set of named parameters of
> "basic" type. I would hope that any IO that offers a generic SchemaIO
> interface will be trivially wrappable as an external transform.
>
> I do agree, however, that this external transform is more general than
> just IOs and transforms accepting/providing Row types.
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Brian Hulette <bh...@google.com>.
> I would hope that any IO that offers a generic SchemaIO
> interface will be trivially wrappable as an external transform.

This is a bit of an aside, but I just wanted to point out that this was a
primary goal of the initial SchemaIO project. The idea was to allow Java IO
developers to implement a single interface to make IOs usable from SQL and
from other SDKs.
To that end, Scott created ExternalSchemaIOTransformRegistrar [1] to find
SchemaIO implementations with ServiceLoader and register external
transforms for them (one for the read side and one for the write).

[1]
https://github.com/apache/beam/blob/master/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java

On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com> wrote:

> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com>
> wrote:
> >
> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
> wrote:
> >>
> >> Hi Cham,
> >>
> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
> their design)? One of SchemaIO's goal is putting a generic interface on IOs
> so users don't have to construct wrappers for cross language use. It looks
> like this new interface can probably construct a Java SchemaIO, so it
> sounds reasonable to me. (This might be something worth testing when you
> implement it.)
> >>
> >> We are starting to add additional functionality (support for automatic
> optimizations, such as filter and project push-down). I'm not sure how this
> is going to work cross language yet, but we will probably end up adding
> metadata needed to reconstruct the transform to the portability proto.
> >
> > Went through it a bit and I think the two designs are complementary.
> Schema aware IO will allow some I/O transform authors to allow easily
> accessing transforms from a remote SDK using a SQL query while the current
> proposal makes defining/using Java transforms easier for non-Java
> programmers. I think both proposals will help reduce the barrier to entry
> for cross-language and will help make more Java transforms available to
> other SDKs.
>
> SQL benefits from being able to declare an IO in textual form.
> Cross-language seeks to establish a standard to describe an IO in a
> language-agnostic form. At their core is the desire to be able to
> instantiate an IO based on a name (which is likely linked to an
> implementation via a registrar) and a set of named parameters of
> "basic" type. I would hope that any IO that offers a generic SchemaIO
> interface will be trivially wrappable as an external transform.
>
> I do agree, however, that this external transform is more general than
> just IOs and transforms accepting/providing Row types.
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Robert Bradshaw <ro...@google.com>.
On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com> wrote:
>
> On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com> wrote:
>>
>> Hi Cham,
>>
>> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and their design)? One of SchemaIO's goal is putting a generic interface on IOs so users don't have to construct wrappers for cross language use. It looks like this new interface can probably construct a Java SchemaIO, so it sounds reasonable to me. (This might be something worth testing when you implement it.)
>>
>> We are starting to add additional functionality (support for automatic optimizations, such as filter and project push-down). I'm not sure how this is going to work cross language yet, but we will probably end up adding metadata needed to reconstruct the transform to the portability proto.
>
> Went through it a bit and I think the two designs are complementary. Schema aware IO will allow some I/O transform authors to allow easily accessing transforms from a remote SDK using a SQL query while the current proposal makes defining/using Java transforms easier for non-Java programmers. I think both proposals will help reduce the barrier to entry for cross-language and will help make more Java transforms available to other SDKs.

SQL benefits from being able to declare an IO in textual form.
Cross-language seeks to establish a standard to describe an IO in a
language-agnostic form. At their core is the desire to be able to
instantiate an IO based on a name (which is likely linked to an
implementation via a registrar) and a set of named parameters of
"basic" type. I would hope that any IO that offers a generic SchemaIO
interface will be trivially wrappable as an external transform.

I do agree, however, that this external transform is more general than
just IOs and transforms accepting/providing Row types.

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com> wrote:

> Hi Cham,
>
> Are you aware of the SchemaIO
> <https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java#L46>
>  and SchemaIOProvider
> <https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java#L41> interfaces
> (and their design
> <https://docs.google.com/document/d/1ic3P8EVGHIydHQ-VMDKbN9kEdwm7sBXMo80VrhwksvI/edit#>)?
> One of SchemaIO's goal is putting a generic interface on IOs so users don't
> have to construct wrappers for cross language use. It looks like this new
> interface can probably construct a Java SchemaIO, so it sounds reasonable
> to me. (This might be something worth testing when you implement it.)
>
> We are starting to add additional functionality (support for automatic
> optimizations, such as filter and project push-down). I'm not sure how this
> is going to work cross language yet, but we will probably end up adding
> metadata needed to reconstruct the transform to the portability proto.
>

Went through it a bit and I think the two designs are complementary. Schema
aware IO will allow some I/O transform authors to allow easily accessing
transforms from a remote SDK using a SQL query while the current proposal
makes defining/using Java transforms easier for non-Java programmers. I
think both proposals will help reduce the barrier to entry for
cross-language and will help make more Java transforms available to other
SDKs.

Thanks,
Cham



>
> Andrew
>
> On Tue, Jul 27, 2021 at 11:00 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> +1
>> On 7/27/21 7:55 PM, Chamikara Jayalath wrote:
>>
>>
>>
>> On Tue, Jul 27, 2021 at 10:51 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> I agree that adding PipelineOptions is technically (nearly) orthogonal,
>>> it is not exactly independent, because the proposal emphasize the need for
>>> it. On the other hand, given the complexity of the original proposal, I
>>> think that this change is of really low complexity, it might boil down to
>>> adding a "repeated string options" field to the ExpansionRequest (or a map,
>>> or whatever), and then reading it back during the expansion. I think that
>>> taking the opportunity of making a somewhat larger change will be better in
>>> this case, because it will instantly enable a broader usage. But - if we
>>> want to treat it independly, can we create a tracking Jira for that?
>>>
>>
>> We have a tracking JIra for the PipelineOptions part:
>> https://issues.apache.org/jira/browse/BEAM-9449
>>
>>
>>>  Jan
>>> On 7/27/21 7:09 PM, Chamikara Jayalath wrote:
>>>
>>>
>>>
>>> On Tue, Jul 27, 2021 at 12:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>>
>>>> Hi Cham,
>>>>
>>>> I think this approach is great, but what I'm missing is PipelineOptions
>>>> for the expansion. This approach makes it possible (and practical) to use
>>>> single expansion service for many different Pipelines and Environments - I
>>>> can imagine a "single" (logical) instance of the expansion service per data
>>>> team/organisation (packaging the expansion service with possibly many
>>>> dependencies might be non-trivial), that means that the expansion might
>>>> need customization that is described precisely by PipelineOptions that are
>>>> passed to the Pipeline that is used for the expansion. Other than that this
>>>> looks great.
>>>>
>>>
>>> Good point. I think that sending in the PipelineOptions as a part of the
>>> expansion service is an orthogonal change that can be useful for both
>>> config object based and class/type lookup based expansions. So I'd like to
>>> consider that separately instead of tying into this work.
>>>
>>> Thanks,
>>> Cham
>>>
>>>  Jan
>>>> On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>>>>
>>>>
>>>>
>>>> On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <ro...@frantil.com>
>>>> wrote:
>>>>
>>>>> Looked at it. LGTM
>>>>>
>>>>> It looks portable enough to be useful for any future Go SDK based
>>>>> Expansion services.
>>>>>
>>>>>  I do wonder if there are more general names than "class" but that's a
>>>>> terminology quibble anyway. (Go doesn't use that term, as Go doesn't have
>>>>> inheritance based polymorphism.) Perhaps "type" is a good replacement,
>>>>> which is in use in most languages in one capacity or another?
>>>>>
>>>>
>>>> Yeah, probably we can call it "type" and describe what the field means
>>>> for each SDK in a comment.
>>>>
>>>>
>>>>> Certainly not a hard blocker.
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath <ch...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Currently, to define Java cross-language transforms, users have to
>>>>>> define three new Java classes: a Registrar, a Builder and a Config Object
>>>>>> [1].
>>>>>>
>>>>>> While this might not be too hard for a Java programmer, learning Java
>>>>>> and developing/building/releasing new classes just to use existing Java
>>>>>> transforms may be cumbersome for Python and Go users. To further simplify
>>>>>> the process for defining new Java cross-language transforms and usage of
>>>>>> such transforms from other SDKs I would like to propose an update to the
>>>>>> cross-language transform expansion protocol.
>>>>>>
>>>>>> Please see the following for details and let me know if you have any
>>>>>> comments.
>>>>>>
>>>>>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>>>>>
>>>>>> Thanks,
>>>>>> Cham
>>>>>>
>>>>>> [1]
>>>>>> https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>>>>>
>>>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Andrew Pilloud <ap...@google.com>.
Hi Cham,

Are you aware of the SchemaIO
<https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIO.java#L46>
 and SchemaIOProvider
<https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/sdks/java/core/src/main/java/org/apache/beam/sdk/schemas/io/SchemaIOProvider.java#L41>
interfaces
(and their design
<https://docs.google.com/document/d/1ic3P8EVGHIydHQ-VMDKbN9kEdwm7sBXMo80VrhwksvI/edit#>)?
One of SchemaIO's goal is putting a generic interface on IOs so users don't
have to construct wrappers for cross language use. It looks like this new
interface can probably construct a Java SchemaIO, so it sounds reasonable
to me. (This might be something worth testing when you implement it.)

We are starting to add additional functionality (support for automatic
optimizations, such as filter and project push-down). I'm not sure how this
is going to work cross language yet, but we will probably end up adding
metadata needed to reconstruct the transform to the portability proto.

Andrew

On Tue, Jul 27, 2021 at 11:00 AM Jan Lukavský <je...@seznam.cz> wrote:

> +1
> On 7/27/21 7:55 PM, Chamikara Jayalath wrote:
>
>
>
> On Tue, Jul 27, 2021 at 10:51 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> I agree that adding PipelineOptions is technically (nearly) orthogonal,
>> it is not exactly independent, because the proposal emphasize the need for
>> it. On the other hand, given the complexity of the original proposal, I
>> think that this change is of really low complexity, it might boil down to
>> adding a "repeated string options" field to the ExpansionRequest (or a map,
>> or whatever), and then reading it back during the expansion. I think that
>> taking the opportunity of making a somewhat larger change will be better in
>> this case, because it will instantly enable a broader usage. But - if we
>> want to treat it independly, can we create a tracking Jira for that?
>>
>
> We have a tracking JIra for the PipelineOptions part:
> https://issues.apache.org/jira/browse/BEAM-9449
>
>
>>  Jan
>> On 7/27/21 7:09 PM, Chamikara Jayalath wrote:
>>
>>
>>
>> On Tue, Jul 27, 2021 at 12:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>>
>>> Hi Cham,
>>>
>>> I think this approach is great, but what I'm missing is PipelineOptions
>>> for the expansion. This approach makes it possible (and practical) to use
>>> single expansion service for many different Pipelines and Environments - I
>>> can imagine a "single" (logical) instance of the expansion service per data
>>> team/organisation (packaging the expansion service with possibly many
>>> dependencies might be non-trivial), that means that the expansion might
>>> need customization that is described precisely by PipelineOptions that are
>>> passed to the Pipeline that is used for the expansion. Other than that this
>>> looks great.
>>>
>>
>> Good point. I think that sending in the PipelineOptions as a part of the
>> expansion service is an orthogonal change that can be useful for both
>> config object based and class/type lookup based expansions. So I'd like to
>> consider that separately instead of tying into this work.
>>
>> Thanks,
>> Cham
>>
>>  Jan
>>> On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>>>
>>>
>>>
>>> On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <ro...@frantil.com> wrote:
>>>
>>>> Looked at it. LGTM
>>>>
>>>> It looks portable enough to be useful for any future Go SDK based
>>>> Expansion services.
>>>>
>>>>  I do wonder if there are more general names than "class" but that's a
>>>> terminology quibble anyway. (Go doesn't use that term, as Go doesn't have
>>>> inheritance based polymorphism.) Perhaps "type" is a good replacement,
>>>> which is in use in most languages in one capacity or another?
>>>>
>>>
>>> Yeah, probably we can call it "type" and describe what the field means
>>> for each SDK in a comment.
>>>
>>>
>>>> Certainly not a hard blocker.
>>>>
>>>>
>>>>
>>>> On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath <ch...@google.com>
>>>> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> Currently, to define Java cross-language transforms, users have to
>>>>> define three new Java classes: a Registrar, a Builder and a Config Object
>>>>> [1].
>>>>>
>>>>> While this might not be too hard for a Java programmer, learning Java
>>>>> and developing/building/releasing new classes just to use existing Java
>>>>> transforms may be cumbersome for Python and Go users. To further simplify
>>>>> the process for defining new Java cross-language transforms and usage of
>>>>> such transforms from other SDKs I would like to propose an update to the
>>>>> cross-language transform expansion protocol.
>>>>>
>>>>> Please see the following for details and let me know if you have any
>>>>> comments.
>>>>>
>>>>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>>>>
>>>>> Thanks,
>>>>> Cham
>>>>>
>>>>> [1]
>>>>> https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>>>>
>>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Jan Lukavský <je...@seznam.cz>.
+1

On 7/27/21 7:55 PM, Chamikara Jayalath wrote:
>
>
> On Tue, Jul 27, 2021 at 10:51 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     I agree that adding PipelineOptions is technically (nearly)
>     orthogonal, it is not exactly independent, because the proposal
>     emphasize the need for it. On the other hand, given the complexity
>     of the original proposal, I think that this change is of really
>     low complexity, it might boil down to adding a "repeated string
>     options" field to the ExpansionRequest (or a map, or whatever),
>     and then reading it back during the expansion. I think that taking
>     the opportunity of making a somewhat larger change will be better
>     in this case, because it will instantly enable a broader usage.
>     But - if we want to treat it independly, can we create a tracking
>     Jira for that?
>
>
> We have a tracking JIra for the PipelineOptions part: 
> https://issues.apache.org/jira/browse/BEAM-9449 
> <https://issues.apache.org/jira/browse/BEAM-9449>
>
>      Jan
>
>     On 7/27/21 7:09 PM, Chamikara Jayalath wrote:
>>
>>
>>     On Tue, Jul 27, 2021 at 12:18 AM Jan Lukavský <je.ik@seznam.cz
>>     <ma...@seznam.cz>> wrote:
>>
>>         Hi Cham,
>>
>>         I think this approach is great, but what I'm missing is
>>         PipelineOptions for the expansion. This approach makes it
>>         possible (and practical) to use single expansion service for
>>         many different Pipelines and Environments - I can imagine a
>>         "single" (logical) instance of the expansion service per data
>>         team/organisation (packaging the expansion service with
>>         possibly many dependencies might be non-trivial), that means
>>         that the expansion might need customization that is described
>>         precisely by PipelineOptions that are passed to the Pipeline
>>         that is used for the expansion. Other than that this looks great.
>>
>>     Good point. I think that sending in the PipelineOptions as a part
>>     of the expansion service is an orthogonal change that can be
>>     useful for both config object based and class/type lookup based
>>     expansions. So I'd like to consider that separately instead of
>>     tying into this work.
>>
>>     Thanks,
>>     Cham
>>
>>          Jan
>>
>>         On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>>>
>>>
>>>         On Mon, Jul 26, 2021 at 8:26 PM Robert Burke
>>>         <robert@frantil.com <ma...@frantil.com>> wrote:
>>>
>>>             Looked at it. LGTM
>>>
>>>             It looks portable enough to be useful for any future Go
>>>             SDK based Expansion services.
>>>
>>>              I do wonder if there are more general names than
>>>             "class" but that's a terminology quibble anyway. (Go
>>>             doesn't use that term, as Go doesn't have inheritance
>>>             based polymorphism.) Perhaps "type" is a good
>>>             replacement, which is in use in most languages in one
>>>             capacity or another?
>>>
>>>
>>>         Yeah, probably we can call it "type" and describe what the
>>>         field means for each SDK in a comment.
>>>
>>>
>>>             Certainly not a hard blocker.
>>>
>>>
>>>
>>>             On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath
>>>             <chamikara@google.com <ma...@google.com>> wrote:
>>>
>>>                 Hi All,
>>>
>>>                 Currently, to define Java cross-language transforms,
>>>                 users have to define three new Java classes: a
>>>                 Registrar, a Builder and a Config Object [1].
>>>
>>>                 While this might not be too hard for a Java
>>>                 programmer, learning Java and
>>>                 developing/building/releasing new classes just to
>>>                 use existing Java transforms may be cumbersome for
>>>                 Python and Go users. To further simplify the process
>>>                 for defining new Java cross-language transforms and
>>>                 usage of such transforms from other SDKs I would
>>>                 like to propose an update to the cross-language
>>>                 transform expansion protocol.
>>>
>>>                 Please see the following for details and let me know
>>>                 if you have any comments.
>>>                 https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>>                 <https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing>
>>>
>>>                 Thanks,
>>>                 Cham
>>>
>>>                 [1]
>>>                 https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>>                 <https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms>
>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Jul 27, 2021 at 10:51 AM Jan Lukavský <je...@seznam.cz> wrote:

> I agree that adding PipelineOptions is technically (nearly) orthogonal, it
> is not exactly independent, because the proposal emphasize the need for it.
> On the other hand, given the complexity of the original proposal, I think
> that this change is of really low complexity, it might boil down to adding
> a "repeated string options" field to the ExpansionRequest (or a map, or
> whatever), and then reading it back during the expansion. I think that
> taking the opportunity of making a somewhat larger change will be better in
> this case, because it will instantly enable a broader usage. But - if we
> want to treat it independly, can we create a tracking Jira for that?
>

We have a tracking JIra for the PipelineOptions part:
https://issues.apache.org/jira/browse/BEAM-9449


>  Jan
> On 7/27/21 7:09 PM, Chamikara Jayalath wrote:
>
>
>
> On Tue, Jul 27, 2021 at 12:18 AM Jan Lukavský <je...@seznam.cz> wrote:
>
>> Hi Cham,
>>
>> I think this approach is great, but what I'm missing is PipelineOptions
>> for the expansion. This approach makes it possible (and practical) to use
>> single expansion service for many different Pipelines and Environments - I
>> can imagine a "single" (logical) instance of the expansion service per data
>> team/organisation (packaging the expansion service with possibly many
>> dependencies might be non-trivial), that means that the expansion might
>> need customization that is described precisely by PipelineOptions that are
>> passed to the Pipeline that is used for the expansion. Other than that this
>> looks great.
>>
>
> Good point. I think that sending in the PipelineOptions as a part of the
> expansion service is an orthogonal change that can be useful for both
> config object based and class/type lookup based expansions. So I'd like to
> consider that separately instead of tying into this work.
>
> Thanks,
> Cham
>
>  Jan
>> On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>>
>>
>>
>> On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <ro...@frantil.com> wrote:
>>
>>> Looked at it. LGTM
>>>
>>> It looks portable enough to be useful for any future Go SDK based
>>> Expansion services.
>>>
>>>  I do wonder if there are more general names than "class" but that's a
>>> terminology quibble anyway. (Go doesn't use that term, as Go doesn't have
>>> inheritance based polymorphism.) Perhaps "type" is a good replacement,
>>> which is in use in most languages in one capacity or another?
>>>
>>
>> Yeah, probably we can call it "type" and describe what the field means
>> for each SDK in a comment.
>>
>>
>>> Certainly not a hard blocker.
>>>
>>>
>>>
>>> On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath <ch...@google.com>
>>> wrote:
>>>
>>>> Hi All,
>>>>
>>>> Currently, to define Java cross-language transforms, users have to
>>>> define three new Java classes: a Registrar, a Builder and a Config Object
>>>> [1].
>>>>
>>>> While this might not be too hard for a Java programmer, learning Java
>>>> and developing/building/releasing new classes just to use existing Java
>>>> transforms may be cumbersome for Python and Go users. To further simplify
>>>> the process for defining new Java cross-language transforms and usage of
>>>> such transforms from other SDKs I would like to propose an update to the
>>>> cross-language transform expansion protocol.
>>>>
>>>> Please see the following for details and let me know if you have any
>>>> comments.
>>>>
>>>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>>>
>>>> Thanks,
>>>> Cham
>>>>
>>>> [1]
>>>> https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>>>
>>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Jan Lukavský <je...@seznam.cz>.
I agree that adding PipelineOptions is technically (nearly) orthogonal, 
it is not exactly independent, because the proposal emphasize the need 
for it. On the other hand, given the complexity of the original 
proposal, I think that this change is of really low complexity, it might 
boil down to adding a "repeated string options" field to the 
ExpansionRequest (or a map, or whatever), and then reading it back 
during the expansion. I think that taking the opportunity of making a 
somewhat larger change will be better in this case, because it will 
instantly enable a broader usage. But - if we want to treat it 
independly, can we create a tracking Jira for that?

  Jan

On 7/27/21 7:09 PM, Chamikara Jayalath wrote:
>
>
> On Tue, Jul 27, 2021 at 12:18 AM Jan Lukavský <je.ik@seznam.cz 
> <ma...@seznam.cz>> wrote:
>
>     Hi Cham,
>
>     I think this approach is great, but what I'm missing is
>     PipelineOptions for the expansion. This approach makes it possible
>     (and practical) to use single expansion service for many different
>     Pipelines and Environments - I can imagine a "single" (logical)
>     instance of the expansion service per data team/organisation
>     (packaging the expansion service with possibly many dependencies
>     might be non-trivial), that means that the expansion might need
>     customization that is described precisely by PipelineOptions that
>     are passed to the Pipeline that is used for the expansion. Other
>     than that this looks great.
>
> Good point. I think that sending in the PipelineOptions as a part of 
> the expansion service is an orthogonal change that can be useful for 
> both config object based and class/type lookup based expansions. So 
> I'd like to consider that separately instead of tying into this work.
>
> Thanks,
> Cham
>
>      Jan
>
>     On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>>
>>
>>     On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <robert@frantil.com
>>     <ma...@frantil.com>> wrote:
>>
>>         Looked at it. LGTM
>>
>>         It looks portable enough to be useful for any future Go SDK
>>         based Expansion services.
>>
>>          I do wonder if there are more general names than "class" but
>>         that's a terminology quibble anyway. (Go doesn't use that
>>         term, as Go doesn't have inheritance based polymorphism.)
>>         Perhaps "type" is a good replacement, which is in use in most
>>         languages in one capacity or another?
>>
>>
>>     Yeah, probably we can call it "type" and describe what the field
>>     means for each SDK in a comment.
>>
>>
>>         Certainly not a hard blocker.
>>
>>
>>
>>         On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath
>>         <chamikara@google.com <ma...@google.com>> wrote:
>>
>>             Hi All,
>>
>>             Currently, to define Java cross-language transforms,
>>             users have to define three new Java classes: a Registrar,
>>             a Builder and a Config Object [1].
>>
>>             While this might not be too hard for a Java programmer,
>>             learning Java and developing/building/releasing new
>>             classes just to use existing Java transforms may be
>>             cumbersome for Python and Go users. To further simplify
>>             the process for defining new Java cross-language
>>             transforms and usage of such transforms from other SDKs I
>>             would like to propose an update to the cross-language
>>             transform expansion protocol.
>>
>>             Please see the following for details and let me know if
>>             you have any comments.
>>             https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>             <https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing>
>>
>>             Thanks,
>>             Cham
>>
>>             [1]
>>             https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>             <https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms>
>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Tue, Jul 27, 2021 at 12:18 AM Jan Lukavský <je...@seznam.cz> wrote:

> Hi Cham,
>
> I think this approach is great, but what I'm missing is PipelineOptions
> for the expansion. This approach makes it possible (and practical) to use
> single expansion service for many different Pipelines and Environments - I
> can imagine a "single" (logical) instance of the expansion service per data
> team/organisation (packaging the expansion service with possibly many
> dependencies might be non-trivial), that means that the expansion might
> need customization that is described precisely by PipelineOptions that are
> passed to the Pipeline that is used for the expansion. Other than that this
> looks great.
>

Good point. I think that sending in the PipelineOptions as a part of the
expansion service is an orthogonal change that can be useful for both
config object based and class/type lookup based expansions. So I'd like to
consider that separately instead of tying into this work.

Thanks,
Cham

 Jan
> On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>
>
>
> On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <ro...@frantil.com> wrote:
>
>> Looked at it. LGTM
>>
>> It looks portable enough to be useful for any future Go SDK based
>> Expansion services.
>>
>>  I do wonder if there are more general names than "class" but that's a
>> terminology quibble anyway. (Go doesn't use that term, as Go doesn't have
>> inheritance based polymorphism.) Perhaps "type" is a good replacement,
>> which is in use in most languages in one capacity or another?
>>
>
> Yeah, probably we can call it "type" and describe what the field means for
> each SDK in a comment.
>
>
>> Certainly not a hard blocker.
>>
>>
>>
>> On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath <ch...@google.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> Currently, to define Java cross-language transforms, users have to
>>> define three new Java classes: a Registrar, a Builder and a Config Object
>>> [1].
>>>
>>> While this might not be too hard for a Java programmer, learning Java
>>> and developing/building/releasing new classes just to use existing Java
>>> transforms may be cumbersome for Python and Go users. To further simplify
>>> the process for defining new Java cross-language transforms and usage of
>>> such transforms from other SDKs I would like to propose an update to the
>>> cross-language transform expansion protocol.
>>>
>>> Please see the following for details and let me know if you have any
>>> comments.
>>>
>>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>>
>>> Thanks,
>>> Cham
>>>
>>> [1]
>>> https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>>
>>

Re: A simpler way to define and use Java cross-language transforms

Posted by Jan Lukavský <je...@seznam.cz>.
Hi Cham,

I think this approach is great, but what I'm missing is PipelineOptions 
for the expansion. This approach makes it possible (and practical) to 
use single expansion service for many different Pipelines and 
Environments - I can imagine a "single" (logical) instance of the 
expansion service per data team/organisation (packaging the expansion 
service with possibly many dependencies might be non-trivial), that 
means that the expansion might need customization that is described 
precisely by PipelineOptions that are passed to the Pipeline that is 
used for the expansion. Other than that this looks great.

  Jan

On 7/27/21 6:16 AM, Chamikara Jayalath wrote:
>
>
> On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <robert@frantil.com 
> <ma...@frantil.com>> wrote:
>
>     Looked at it. LGTM
>
>     It looks portable enough to be useful for any future Go SDK based
>     Expansion services.
>
>      I do wonder if there are more general names than "class" but
>     that's a terminology quibble anyway. (Go doesn't use that term, as
>     Go doesn't have inheritance based polymorphism.) Perhaps "type" is
>     a good replacement, which is in use in most languages in one
>     capacity or another?
>
>
> Yeah, probably we can call it "type" and describe what the field means 
> for each SDK in a comment.
>
>
>     Certainly not a hard blocker.
>
>
>
>     On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath
>     <chamikara@google.com <ma...@google.com>> wrote:
>
>         Hi All,
>
>         Currently, to define Java cross-language transforms, users
>         have to define three new Java classes: a Registrar, a Builder
>         and a Config Object [1].
>
>         While this might not be too hard for a Java programmer,
>         learning Java and developing/building/releasing new classes
>         just to use existing Java transforms may be cumbersome for
>         Python and Go users. To further simplify the process for
>         defining new Java cross-language transforms and usage of such
>         transforms from other SDKs I would like to propose an update
>         to the cross-language transform expansion protocol.
>
>         Please see the following for details and let me know if you
>         have any comments.
>         https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>         <https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing>
>
>         Thanks,
>         Cham
>
>         [1]
>         https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>         <https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms>
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Chamikara Jayalath <ch...@google.com>.
On Mon, Jul 26, 2021 at 8:26 PM Robert Burke <ro...@frantil.com> wrote:

> Looked at it. LGTM
>
> It looks portable enough to be useful for any future Go SDK based
> Expansion services.
>
>  I do wonder if there are more general names than "class" but that's a
> terminology quibble anyway. (Go doesn't use that term, as Go doesn't have
> inheritance based polymorphism.) Perhaps "type" is a good replacement,
> which is in use in most languages in one capacity or another?
>

Yeah, probably we can call it "type" and describe what the field means for
each SDK in a comment.


> Certainly not a hard blocker.
>
>
>
> On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath <ch...@google.com>
> wrote:
>
>> Hi All,
>>
>> Currently, to define Java cross-language transforms, users have to define
>> three new Java classes: a Registrar, a Builder and a Config Object [1].
>>
>> While this might not be too hard for a Java programmer, learning Java and
>> developing/building/releasing new classes just to use existing Java
>> transforms may be cumbersome for Python and Go users. To further simplify
>> the process for defining new Java cross-language transforms and usage of
>> such transforms from other SDKs I would like to propose an update to the
>> cross-language transform expansion protocol.
>>
>> Please see the following for details and let me know if you have any
>> comments.
>>
>> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>>
>> Thanks,
>> Cham
>>
>> [1]
>> https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>>
>

Re: A simpler way to define and use Java cross-language transforms

Posted by Robert Burke <ro...@frantil.com>.
Looked at it. LGTM

It looks portable enough to be useful for any future Go SDK based Expansion
services.

 I do wonder if there are more general names than "class" but that's a
terminology quibble anyway. (Go doesn't use that term, as Go doesn't have
inheritance based polymorphism.) Perhaps "type" is a good replacement,
which is in use in most languages in one capacity or another?

Certainly not a hard blocker.



On Mon, Jul 26, 2021, 7:10 PM Chamikara Jayalath <ch...@google.com>
wrote:

> Hi All,
>
> Currently, to define Java cross-language transforms, users have to define
> three new Java classes: a Registrar, a Builder and a Config Object [1].
>
> While this might not be too hard for a Java programmer, learning Java and
> developing/building/releasing new classes just to use existing Java
> transforms may be cumbersome for Python and Go users. To further simplify
> the process for defining new Java cross-language transforms and usage of
> such transforms from other SDKs I would like to propose an update to the
> cross-language transform expansion protocol.
>
> Please see the following for details and let me know if you have any
> comments.
>
> https://docs.google.com/document/d/1ECXSWicE31K-vSxdb4qL6UcmovOAWvE-ZHFT3NTM654/edit?usp=sharing
>
> Thanks,
> Cham
>
> [1]
> https://beam.apache.org/documentation/programming-guide/#create-x-lang-transforms
>