You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Robert Burke <re...@google.com> on 2020/01/27 18:10:28 UTC

Re: Apache Beam KafkaIO Transform "go" available?

+cc dev@beam
Hello Daisy!
There presently isn't a Kafka transform for Go.

The Go SDK is still experimental, largely due to scalable IO support, which
is why the Go SDK isn't represented in the built-in io page
<https://beam.apache.org/documentation/io/built-in/>.

There's presently no way for an SDK user to write a Streaming source in the
Go SDK, since there's no mechanism for a DoFn to "self terminate" bundles,
such as to allow for scalability and windowing from streaming sources.

However, SplittableDoFns are on their way, and will eventually be the
solution for writing these.

At present, the Beam Go SDK IOs haven't been tested and vetted for
production use. Until the initial SplittableDoFn support is added to the Go
SDK, Batch transforms cannot split, and can't scale beyond a single worker
thread. This batch version should land in the next few months, and the
streaming version land a few months after that, after which a Kafka IO can
be developed.

I wish I had better news for you, but I can say progress is being made.

Robert Burke


On Sun, Jan 26, 2020 at 10:14 PM Daisy Wu <da...@yahoo.com> wrote:

> Hi, Robert,
>
> I found your name from the Apache Beam WIKI page.
>
> I am working on building a data ingestion pipeline using Apache Beam "go"
> SDK.
>
> My pipeline is to consume data from Kafka queue and persist the data to
> Google Cloud Bigtable (and/or to another Kafka topic).
>
> So far, I have not been able to find a Kafka IO Connector (also known as
> Apache I/O Transform) written in "go" (I was able to find a java version,
> however).
>
> Here's link to supported Apache Beam built-in I/O transforms:
> https://beam.apache.org/documentation/io/built-in/
>
> I am looking for the "go" equivalent of the following Java code:
>
>     pipeline.apply("kafka_deserialization", KafkaIO.<String, String>read()
> 		.withBootstrapServers(KAFKA_BROKER)
> 		.withTopic(KAFKA_TOPIC)
> 		.withConsumerConfigUpdates(CONSUMER_CONFIG)
> 		.withKeyDeserializer(StringDeserializer.class)
> 		.withValueDeserializer(StringDeserializer.class))
>
> Do you have any information on the availability of KafkaIO
> Connector/Transform "go" SDK/library?
>
> Any help or information would be much appreciated.
>
> Thank you.
>
>
> Daisy Wu
>

Re: Apache Beam KafkaIO Transform "go" available?

Posted by Daisy Wu <da...@yahoo.com>.
Thank you so much for your prompt reply.
This really helped us make design decisions for our future projects.
I appreciate your help.

Daisy Wu 

    On Monday, January 27, 2020, 10:10:40 AM PST, Robert Burke <re...@google.com> wrote:  
 
 +cc dev@beam
Hello Daisy!There presently isn't a Kafka transform for Go. 

The Go SDK is still experimental, largely due to scalable IO support, which is why the Go SDK isn't represented in the built-in io page.
There's presently no way for an SDK user to write a Streaming source in the Go SDK, since there's no mechanism for a DoFn to "self terminate" bundles, such as to allow for scalability and windowing from streaming sources. 
However, SplittableDoFns are on their way, and will eventually be the solution for writing these.
At present, the Beam Go SDK IOs haven't been tested and vetted for production use. Until the initial SplittableDoFn support is added to the Go SDK, Batch transforms cannot split, and can't scale beyond a single worker thread. This batch version should land in the next few months, and the streaming version land a few months after that, after which a Kafka IO can be developed. 
I wish I had better news for you, but I can say progress is being made.
Robert Burke

On Sun, Jan 26, 2020 at 10:14 PM Daisy Wu <da...@yahoo.com> wrote:

Hi, Robert,
I found your name from the Apache Beam WIKI page. 
I am working on building a data ingestion pipeline using Apache Beam "go" SDK.

My pipeline is to consume data from Kafka queue and persist the data to Google Cloud Bigtable (and/or to another Kafka topic).

So far, I have not been able to find a Kafka IO Connector (also known as Apache I/O Transform) written in "go" (I was able to find a java version, however).

Here's link to supported Apache Beam built-in I/O transforms:https://beam.apache.org/documentation/io/built-in/

I am looking for the "go" equivalent of the following Java code:
    pipeline.apply("kafka_deserialization", KafkaIO.<String, String>read()
		.withBootstrapServers(KAFKA_BROKER)
		.withTopic(KAFKA_TOPIC)
		.withConsumerConfigUpdates(CONSUMER_CONFIG)
		.withKeyDeserializer(StringDeserializer.class)
		.withValueDeserializer(StringDeserializer.class))
Do you have any information on the availability of KafkaIO Connector/Transform "go" SDK/library?

Any help or information would be much appreciated.

Thank you.


Daisy Wu