You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Maximilian Michels (JIRA)" <ji...@apache.org> on 2019/04/23 10:18:00 UTC

[jira] [Commented] (BEAM-6730) Enable configuration of Java transforms (specifically IO) through other SDKs

    [ https://issues.apache.org/jira/browse/BEAM-6730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823932#comment-16823932 ] 

Maximilian Michels commented on BEAM-6730:
------------------------------------------

Summary of what has been done for this issue:

In this issue we've added the capability to externally configure Java transforms. To use a Java transform from another SDK, e.g. Python, the transform has to be made configurable. 

A transform is configurable when an {{ExternalTransformBuilder}} has been registered with the {{ExpansionService}}. The ExternalTransformBuilder takes a configuration Pojo that is initialized with the configuration entries sent via the SDK. The Pojo needs to have a constructor with no arguments and needs to define setters for all its fields. The builder is registered via an {{ExternalTransformRegistrar}} which specifies the URN and the above builder.

From the Python side, {{ExternalTransform}} needs to be supplied with a URN, a payload, and the expansion service address. The payload is a configuration map which conforms to {{ExternalConfigurationPayload}} in the proto. Users do not want to use ExternalTransform directly, but instead use one of the wrappers. In 2.12.0 {{GenerateSequence}} is the first externally configured transform. BEAM-7029 adds {{KafkaIO}}.

> Enable configuration of Java transforms (specifically IO) through other SDKs
> ----------------------------------------------------------------------------
>
>                 Key: BEAM-6730
>                 URL: https://issues.apache.org/jira/browse/BEAM-6730
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-flink, sdk-java-core, sdk-py-core
>            Reporter: Maximilian Michels
>            Assignee: Maximilian Michels
>            Priority: Major
>             Fix For: 2.12.0
>
>          Time Spent: 11h 50m
>  Remaining Estimate: 0h
>
> Since https://github.com/apache/beam/pull/7316 we can reference external transforms which are transforms only available in a "foreign" SDKs. This allows us to fill the gap in terms of missing transforms in the Python and Go SDK, specifically IO transforms.
> We can start collecting/exposing transforms that Beam users need. The following transforms could be interesting:
> - KafkaIO / KinesisIO
> - CassandraIO / ElasticserchIO / Hbase / Redis
> - JDBC
> - S3 file system
> - GenerateSequence
> See also https://s.apache.org/beam-cross-language-io and BEAM-6485.
> CC [~robertwb] [~chamikara] [~thw]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)