You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Brian Hulette <bh...@google.com> on 2021/09/03 19:52:37 UTC

Re: A simpler way to define and use Java cross-language transforms

> I would hope that any IO that offers a generic SchemaIO
> interface will be trivially wrappable as an external transform.

This is a bit of an aside, but I just wanted to point out that this was a
primary goal of the initial SchemaIO project. The idea was to allow Java IO
developers to implement a single interface to make IOs usable from SQL and
from other SDKs.
To that end, Scott created ExternalSchemaIOTransformRegistrar [1] to find
SchemaIO implementations with ServiceLoader and register external
transforms for them (one for the read side and one for the write).

[1]
https://github.com/apache/beam/blob/master/sdks/java/extensions/schemaio-expansion-service/src/main/java/org/apache/beam/sdk/extensions/schemaio/expansion/ExternalSchemaIOTransformRegistrar.java

On Tue, Jul 27, 2021 at 6:28 PM Robert Bradshaw <ro...@google.com> wrote:

> On Tue, Jul 27, 2021 at 1:31 PM Chamikara Jayalath <ch...@google.com>
> wrote:
> >
> > On Tue, Jul 27, 2021 at 11:03 AM Andrew Pilloud <ap...@google.com>
> wrote:
> >>
> >> Hi Cham,
> >>
> >> Are you aware of the SchemaIO and SchemaIOProvider interfaces (and
> their design)? One of SchemaIO's goal is putting a generic interface on IOs
> so users don't have to construct wrappers for cross language use. It looks
> like this new interface can probably construct a Java SchemaIO, so it
> sounds reasonable to me. (This might be something worth testing when you
> implement it.)
> >>
> >> We are starting to add additional functionality (support for automatic
> optimizations, such as filter and project push-down). I'm not sure how this
> is going to work cross language yet, but we will probably end up adding
> metadata needed to reconstruct the transform to the portability proto.
> >
> > Went through it a bit and I think the two designs are complementary.
> Schema aware IO will allow some I/O transform authors to allow easily
> accessing transforms from a remote SDK using a SQL query while the current
> proposal makes defining/using Java transforms easier for non-Java
> programmers. I think both proposals will help reduce the barrier to entry
> for cross-language and will help make more Java transforms available to
> other SDKs.
>
> SQL benefits from being able to declare an IO in textual form.
> Cross-language seeks to establish a standard to describe an IO in a
> language-agnostic form. At their core is the desire to be able to
> instantiate an IO based on a name (which is likely linked to an
> implementation via a registrar) and a set of named parameters of
> "basic" type. I would hope that any IO that offers a generic SchemaIO
> interface will be trivially wrappable as an external transform.
>
> I do agree, however, that this external transform is more general than
> just IOs and transforms accepting/providing Row types.
>