You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Łukasz Gajowy <lg...@apache.org> on 2019/05/09 13:32:09 UTC

Streaming pipelines in all SDKs!

Hi,

part of our work that needs to be done to create tests for Core Apache Beam
operations is to enable both batch and streaming testing scenarios in all
SDKs (not only Java, so lot's of portability usage here). I gathered some
thoughts on how (I think) this could be done at the current state of Beam:

https://s.apache.org/portable-load-tests

I added some questions at the end of the doc but I will paste them here too
for visibility:

   - What is the status of Cross Language Support for Go SDK? Is it
   non-trivial to enable such support (if it doesn't exist yet)?
   - According to other contributors knowledge/experience: I noticed that
   streaming with KafkaIO is currently supported by wrapping the
   ExternalTransform in Python SDK. Do you think that streaming pipelines will
   "just work" with the current state of portability if I do the same for
   UnboundedSyntheticSource or is there something else missing?

BTW: great to see Cross Language Support happening. Thanks for doing this!
:)

Thanks,
Łukasz

Re: Streaming pipelines in all SDKs!

Posted by Ismaël Mejía <ie...@gmail.com>.
Thanks

On Thu, Jun 13, 2019 at 2:02 PM Łukasz Gajowy <lg...@apache.org> wrote:
>
> Created a PR: https://github.com/apache/beam/pull/8846
>
> śr., 12 cze 2019 o 11:40 Ismaël Mejía <ie...@gmail.com> napisał(a):
>>
>> Can you please add this to the design documents webpage.
>> https://beam.apache.org/contribute/design-documents/
>>
>> On Fri, May 10, 2019 at 11:50 AM Maximilian Michels <mx...@apache.org> wrote:
>> >
>> > > So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.
>> >
>> > It's in the translation code:
>> > https://github.com/apache/beam/blob/6679b00138a5b82a6a55e7bc94c453957cea501c/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java#L216
>> >
>> > For migration I think that's a valid path, especially because Runners
>> > already have the translation code in place. We can later swap-out the
>> > UnboundedSource translation with the SDF wrapper.
>> >
>> > -Max
>> >
>> > On 09.05.19 22:46, Robert Bradshaw wrote:
>> > > From: Chamikara Jayalath <ch...@google.com>
>> > > Date: Thu, May 9, 2019 at 7:49 PM
>> > > To: dev
>> > >
>> > >> From: Maximilian Michels <mx...@apache.org>
>> > >> Date: Thu, May 9, 2019 at 9:21 AM
>> > >> To: <de...@beam.apache.org>
>> > >>
>> > >>> Thanks for sharing your ideas for load testing!
>> > >>>
>> > >>>> According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing?
>> > >>>
>> > >>> Basically yes, but it requires a bit more effort than just wrapping
>> > >>> about ExternalTransform. You need to provide an ExternalTransformBuilder
>> > >>> for the transform to be configured externally.
>> > >>>
>> > >>> In portability UnboundedSources can only be supported via SDF. To still
>> > >>> be able to use legacy IO which uses UnboundedSource the Runner has to
>> > >>> supply this capability (which the Flink Runner supports). This will
>> > >>> likely go away if we have an UnboundedSource SDF Wrapper :)
>> > >>
>> > >>
>> > >> So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.
>> > >>
>> > >> Agree, that we cannot properly support cross-language unbounded sources till we have SDF and a proper unbounded source to SDF wrapper.
>> > >
>> > > That is correct. Go will need SDF support as well.
>> > >
>> > > As waiting on implementing the expansion service, except for the
>> > > vending of extra artifacts (which will be an extension), we discussed
>> > > this earlier and it's considered stable and ready to build on now.
>> > >

Re: Streaming pipelines in all SDKs!

Posted by Łukasz Gajowy <lg...@apache.org>.
Created a PR: https://github.com/apache/beam/pull/8846

śr., 12 cze 2019 o 11:40 Ismaël Mejía <ie...@gmail.com> napisał(a):

> Can you please add this to the design documents webpage.
> https://beam.apache.org/contribute/design-documents/
>
> On Fri, May 10, 2019 at 11:50 AM Maximilian Michels <mx...@apache.org>
> wrote:
> >
> > > So, FlinkRunner has some sort of special support for executing
> UnboundedSource via the runner in the portable world ? I see a transform
> override for bounded sources in PortableRunner [1] but nothing for
> unbounded sources.
> >
> > It's in the translation code:
> >
> https://github.com/apache/beam/blob/6679b00138a5b82a6a55e7bc94c453957cea501c/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java#L216
> >
> > For migration I think that's a valid path, especially because Runners
> > already have the translation code in place. We can later swap-out the
> > UnboundedSource translation with the SDF wrapper.
> >
> > -Max
> >
> > On 09.05.19 22:46, Robert Bradshaw wrote:
> > > From: Chamikara Jayalath <ch...@google.com>
> > > Date: Thu, May 9, 2019 at 7:49 PM
> > > To: dev
> > >
> > >> From: Maximilian Michels <mx...@apache.org>
> > >> Date: Thu, May 9, 2019 at 9:21 AM
> > >> To: <de...@beam.apache.org>
> > >>
> > >>> Thanks for sharing your ideas for load testing!
> > >>>
> > >>>> According to other contributors knowledge/experience: I noticed
> that streaming with KafkaIO is currently supported by wrapping the
> ExternalTransform in Python SDK. Do you think that streaming pipelines will
> "just work" with the current state of portability if I do the same for
> UnboundedSyntheticSource or is there something else missing?
> > >>>
> > >>> Basically yes, but it requires a bit more effort than just wrapping
> > >>> about ExternalTransform. You need to provide an
> ExternalTransformBuilder
> > >>> for the transform to be configured externally.
> > >>>
> > >>> In portability UnboundedSources can only be supported via SDF. To
> still
> > >>> be able to use legacy IO which uses UnboundedSource the Runner has to
> > >>> supply this capability (which the Flink Runner supports). This will
> > >>> likely go away if we have an UnboundedSource SDF Wrapper :)
> > >>
> > >>
> > >> So, FlinkRunner has some sort of special support for executing
> UnboundedSource via the runner in the portable world ? I see a transform
> override for bounded sources in PortableRunner [1] but nothing for
> unbounded sources.
> > >>
> > >> Agree, that we cannot properly support cross-language unbounded
> sources till we have SDF and a proper unbounded source to SDF wrapper.
> > >
> > > That is correct. Go will need SDF support as well.
> > >
> > > As waiting on implementing the expansion service, except for the
> > > vending of extra artifacts (which will be an extension), we discussed
> > > this earlier and it's considered stable and ready to build on now.
> > >
>

Re: Streaming pipelines in all SDKs!

Posted by Ismaël Mejía <ie...@gmail.com>.
Can you please add this to the design documents webpage.
https://beam.apache.org/contribute/design-documents/

On Fri, May 10, 2019 at 11:50 AM Maximilian Michels <mx...@apache.org> wrote:
>
> > So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.
>
> It's in the translation code:
> https://github.com/apache/beam/blob/6679b00138a5b82a6a55e7bc94c453957cea501c/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java#L216
>
> For migration I think that's a valid path, especially because Runners
> already have the translation code in place. We can later swap-out the
> UnboundedSource translation with the SDF wrapper.
>
> -Max
>
> On 09.05.19 22:46, Robert Bradshaw wrote:
> > From: Chamikara Jayalath <ch...@google.com>
> > Date: Thu, May 9, 2019 at 7:49 PM
> > To: dev
> >
> >> From: Maximilian Michels <mx...@apache.org>
> >> Date: Thu, May 9, 2019 at 9:21 AM
> >> To: <de...@beam.apache.org>
> >>
> >>> Thanks for sharing your ideas for load testing!
> >>>
> >>>> According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing?
> >>>
> >>> Basically yes, but it requires a bit more effort than just wrapping
> >>> about ExternalTransform. You need to provide an ExternalTransformBuilder
> >>> for the transform to be configured externally.
> >>>
> >>> In portability UnboundedSources can only be supported via SDF. To still
> >>> be able to use legacy IO which uses UnboundedSource the Runner has to
> >>> supply this capability (which the Flink Runner supports). This will
> >>> likely go away if we have an UnboundedSource SDF Wrapper :)
> >>
> >>
> >> So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.
> >>
> >> Agree, that we cannot properly support cross-language unbounded sources till we have SDF and a proper unbounded source to SDF wrapper.
> >
> > That is correct. Go will need SDF support as well.
> >
> > As waiting on implementing the expansion service, except for the
> > vending of extra artifacts (which will be an extension), we discussed
> > this earlier and it's considered stable and ready to build on now.
> >

Re: Streaming pipelines in all SDKs!

Posted by Maximilian Michels <mx...@apache.org>.
> So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.

It's in the translation code: 
https://github.com/apache/beam/blob/6679b00138a5b82a6a55e7bc94c453957cea501c/runners/flink/src/main/java/org/apache/beam/runners/flink/FlinkStreamingPortablePipelineTranslator.java#L216

For migration I think that's a valid path, especially because Runners 
already have the translation code in place. We can later swap-out the 
UnboundedSource translation with the SDF wrapper.

-Max

On 09.05.19 22:46, Robert Bradshaw wrote:
> From: Chamikara Jayalath <ch...@google.com>
> Date: Thu, May 9, 2019 at 7:49 PM
> To: dev
> 
>> From: Maximilian Michels <mx...@apache.org>
>> Date: Thu, May 9, 2019 at 9:21 AM
>> To: <de...@beam.apache.org>
>>
>>> Thanks for sharing your ideas for load testing!
>>>
>>>> According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing?
>>>
>>> Basically yes, but it requires a bit more effort than just wrapping
>>> about ExternalTransform. You need to provide an ExternalTransformBuilder
>>> for the transform to be configured externally.
>>>
>>> In portability UnboundedSources can only be supported via SDF. To still
>>> be able to use legacy IO which uses UnboundedSource the Runner has to
>>> supply this capability (which the Flink Runner supports). This will
>>> likely go away if we have an UnboundedSource SDF Wrapper :)
>>
>>
>> So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.
>>
>> Agree, that we cannot properly support cross-language unbounded sources till we have SDF and a proper unbounded source to SDF wrapper.
> 
> That is correct. Go will need SDF support as well.
> 
> As waiting on implementing the expansion service, except for the
> vending of extra artifacts (which will be an extension), we discussed
> this earlier and it's considered stable and ready to build on now.
> 

Re: Streaming pipelines in all SDKs!

Posted by Robert Bradshaw <ro...@google.com>.
From: Chamikara Jayalath <ch...@google.com>
Date: Thu, May 9, 2019 at 7:49 PM
To: dev

> From: Maximilian Michels <mx...@apache.org>
> Date: Thu, May 9, 2019 at 9:21 AM
> To: <de...@beam.apache.org>
>
>> Thanks for sharing your ideas for load testing!
>>
>> > According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing?
>>
>> Basically yes, but it requires a bit more effort than just wrapping
>> about ExternalTransform. You need to provide an ExternalTransformBuilder
>> for the transform to be configured externally.
>>
>> In portability UnboundedSources can only be supported via SDF. To still
>> be able to use legacy IO which uses UnboundedSource the Runner has to
>> supply this capability (which the Flink Runner supports). This will
>> likely go away if we have an UnboundedSource SDF Wrapper :)
>
>
> So, FlinkRunner has some sort of special support for executing UnboundedSource via the runner in the portable world ? I see a transform override for bounded sources in PortableRunner [1] but nothing for unbounded sources.
>
> Agree, that we cannot properly support cross-language unbounded sources till we have SDF and a proper unbounded source to SDF wrapper.

That is correct. Go will need SDF support as well.

As waiting on implementing the expansion service, except for the
vending of extra artifacts (which will be an extension), we discussed
this earlier and it's considered stable and ready to build on now.

Re: Streaming pipelines in all SDKs!

Posted by Chamikara Jayalath <ch...@google.com>.
*From: *Maximilian Michels <mx...@apache.org>
*Date: *Thu, May 9, 2019 at 9:21 AM
*To: * <de...@beam.apache.org>

Thanks for sharing your ideas for load testing!
>
> > According to other contributors knowledge/experience: I noticed that
> streaming with KafkaIO is currently supported by wrapping the
> ExternalTransform in Python SDK. Do you think that streaming pipelines will
> "just work" with the current state of portability if I do the same for
> UnboundedSyntheticSource or is there something else missing?
>
> Basically yes, but it requires a bit more effort than just wrapping
> about ExternalTransform. You need to provide an ExternalTransformBuilder
> for the transform to be configured externally.
>
> In portability UnboundedSources can only be supported via SDF. To still
> be able to use legacy IO which uses UnboundedSource the Runner has to
> supply this capability (which the Flink Runner supports). This will
> likely go away if we have an UnboundedSource SDF Wrapper :)
>

So, FlinkRunner has some sort of special support for executing
UnboundedSource via the runner in the portable world ? I see a transform
override for bounded sources in PortableRunner [1] but nothing for
unbounded sources.

Agree, that we cannot properly support cross-language unbounded sources
till we have SDF and a proper unbounded source to SDF wrapper.

[1]
https://github.com/apache/beam/blob/master/runners/reference/java/src/main/java/org/apache/beam/runners/reference/PortableRunner.java#L140

>
> As for the stability of the expansion protocol, I think it's relatively
> stable and won't changed fundamentally.
>
> Cheers,
> Max
>
> On 09.05.19 16:21, Łukasz Gajowy wrote:
> > My recommendation is that we make sure the protocol is stable and
> > implemented on the Python SDK side before starting the Go SDK side,
> > since that work is already in progress.
> >
> > +1 This is exactly the roadmap that I had in mind - start with
> > externalizing and using the synthetic sources in Python SDK and then
> > proceed with Go. Still worth knowing what's going on there so that's why
> > I asked. :)
> >
> > Thanks,
> > Łukasz
> >
> > czw., 9 maj 2019 o 16:03 Robert Burke <robert@frantil.com
> > <ma...@frantil.com>> napisał(a):
> >
> >     Currently the Go SDK doesn't have cross Language support
> >     implemented. My recommendation is that we make sure the protocol is
> >     stable and implemented on the Python SDK side before starting the Go
> >     SDK side, since that work is already in progress.
> >
> >       The relevant state of the Go SDK:
> >     * beam.External exists, for specifying go transforms. (See the Go
> >     SDK's PubSubIO for an example)
> >     * the generated go code for the protos, including the Expansions
> >     service API was refreshed last week. Meaning the work isn't blocked
> >     on that.
> >
> >     In principle, the work would be to
> >     * ensure that the SDK side of job submission connects and looks up
> >     relevant transforms against the Expansion service if needed, and
> >     does the appropriate pipeline graph surgery.
> >        *This may be something that's handled as some kind of hook and
> >     registration process for generally handling external transforms SDK
> >     side.
> >     * Have some kind of external transform to specify and configure on
> >     the Go side.
> >
> >     Most of this can be ported from the Python implementation once it's
> >     stabilized.
> >
> >     As with all my recommendations on how to do things with the Go SDK,
> >     feel free to ignore it and forge ahead. I look forward to someone
> >     tackling this, whenever it happens!
> >
> >     Your friendly neighborhood distributed gopher wrangler,
> >     Robert Burke
> >
> >     Related:
> >     PR 8531 [1] begins adding automates testing of the Go SDK against
> >     Flink, which should assist with ensuring this eventual work keeps
> >     working.
> >
> >     [1]: https://github.com/apache/beam/pull/8531
> >
> >     On Thu, May 9, 2019, 6:32 AM Łukasz Gajowy <lgajowy@apache.org
> >     <ma...@apache.org>> wrote:
> >
> >         Hi,
> >
> >         part of our work that needs to be done to create tests for Core
> >         Apache Beam operations is to enable both batch and streaming
> >         testing scenarios in all SDKs (not only Java, so lot's of
> >         portability usage here). I gathered some thoughts on how (I
> >         think) this could be done at the current state of Beam:
> >
> >         https://s.apache.org/portable-load-tests
> >
> >         I added some questions at the end of the doc but I will paste
> >         them here too for visibility:
> >
> >           * What is the status of Cross Language Support for Go SDK? Is
> >             it non-trivial to enable such support (if it doesn't exist
> yet)?
> >           * According to other contributors knowledge/experience: I
> >             noticed that streaming with KafkaIO is currently supported
> >             by wrapping the ExternalTransform in Python SDK. Do you
> >             think that streaming pipelines will "just work" with the
> >             current state of portability if I do the same for
> >             UnboundedSyntheticSource or is there something else missing?
> >
> >         BTW: great to see Cross Language Support happening. Thanks for
> >         doing this! :)
> >
> >         Thanks,
> >         Łukasz
> >
> >
> >
> >
>

Re: Streaming pipelines in all SDKs!

Posted by Maximilian Michels <mx...@apache.org>.
Thanks for sharing your ideas for load testing!

> According to other contributors knowledge/experience: I noticed that streaming with KafkaIO is currently supported by wrapping the ExternalTransform in Python SDK. Do you think that streaming pipelines will "just work" with the current state of portability if I do the same for UnboundedSyntheticSource or is there something else missing? 

Basically yes, but it requires a bit more effort than just wrapping 
about ExternalTransform. You need to provide an ExternalTransformBuilder 
for the transform to be configured externally.

In portability UnboundedSources can only be supported via SDF. To still 
be able to use legacy IO which uses UnboundedSource the Runner has to 
supply this capability (which the Flink Runner supports). This will 
likely go away if we have an UnboundedSource SDF Wrapper :)

As for the stability of the expansion protocol, I think it's relatively 
stable and won't changed fundamentally.

Cheers,
Max

On 09.05.19 16:21, Łukasz Gajowy wrote:
> My recommendation is that we make sure the protocol is stable and 
> implemented on the Python SDK side before starting the Go SDK side, 
> since that work is already in progress.
> 
> +1 This is exactly the roadmap that I had in mind - start with 
> externalizing and using the synthetic sources in Python SDK and then 
> proceed with Go. Still worth knowing what's going on there so that's why 
> I asked. :)
> 
> Thanks,
> Łukasz
> 
> czw., 9 maj 2019 o 16:03 Robert Burke <robert@frantil.com 
> <ma...@frantil.com>> napisał(a):
> 
>     Currently the Go SDK doesn't have cross Language support
>     implemented. My recommendation is that we make sure the protocol is
>     stable and implemented on the Python SDK side before starting the Go
>     SDK side, since that work is already in progress.
> 
>       The relevant state of the Go SDK:
>     * beam.External exists, for specifying go transforms. (See the Go
>     SDK's PubSubIO for an example)
>     * the generated go code for the protos, including the Expansions
>     service API was refreshed last week. Meaning the work isn't blocked
>     on that.
> 
>     In principle, the work would be to
>     * ensure that the SDK side of job submission connects and looks up
>     relevant transforms against the Expansion service if needed, and
>     does the appropriate pipeline graph surgery.
>        *This may be something that's handled as some kind of hook and
>     registration process for generally handling external transforms SDK
>     side.
>     * Have some kind of external transform to specify and configure on
>     the Go side.
> 
>     Most of this can be ported from the Python implementation once it's
>     stabilized.
> 
>     As with all my recommendations on how to do things with the Go SDK,
>     feel free to ignore it and forge ahead. I look forward to someone
>     tackling this, whenever it happens!
> 
>     Your friendly neighborhood distributed gopher wrangler,
>     Robert Burke
> 
>     Related:
>     PR 8531 [1] begins adding automates testing of the Go SDK against
>     Flink, which should assist with ensuring this eventual work keeps
>     working.
> 
>     [1]: https://github.com/apache/beam/pull/8531
> 
>     On Thu, May 9, 2019, 6:32 AM Łukasz Gajowy <lgajowy@apache.org
>     <ma...@apache.org>> wrote:
> 
>         Hi,
> 
>         part of our work that needs to be done to create tests for Core
>         Apache Beam operations is to enable both batch and streaming
>         testing scenarios in all SDKs (not only Java, so lot's of
>         portability usage here). I gathered some thoughts on how (I
>         think) this could be done at the current state of Beam:
> 
>         https://s.apache.org/portable-load-tests
> 
>         I added some questions at the end of the doc but I will paste
>         them here too for visibility:
> 
>           * What is the status of Cross Language Support for Go SDK? Is
>             it non-trivial to enable such support (if it doesn't exist yet)?
>           * According to other contributors knowledge/experience: I
>             noticed that streaming with KafkaIO is currently supported
>             by wrapping the ExternalTransform in Python SDK. Do you
>             think that streaming pipelines will "just work" with the
>             current state of portability if I do the same for
>             UnboundedSyntheticSource or is there something else missing? 
> 
>         BTW: great to see Cross Language Support happening. Thanks for
>         doing this! :)
> 
>         Thanks,
>         Łukasz
> 
> 
> 
> 

Re: Streaming pipelines in all SDKs!

Posted by Łukasz Gajowy <lu...@gmail.com>.
My recommendation is that we make sure the protocol is stable and
implemented on the Python SDK side before starting the Go SDK side, since
that work is already in progress.

+1 This is exactly the roadmap that I had in mind - start with
externalizing and using the synthetic sources in Python SDK and then
proceed with Go. Still worth knowing what's going on there so that's why I
asked. :)

Thanks,
Łukasz

czw., 9 maj 2019 o 16:03 Robert Burke <ro...@frantil.com> napisał(a):

> Currently the Go SDK doesn't have cross Language support implemented. My
> recommendation is that we make sure the protocol is stable and implemented
> on the Python SDK side before starting the Go SDK side, since that work
> is already in progress.
>
>  The relevant state of the Go SDK:
> * beam.External exists, for specifying go transforms. (See the Go SDK's
> PubSubIO for an example)
> * the generated go code for the protos, including the Expansions service
> API was refreshed last week. Meaning the work isn't blocked on that.
>
> In principle, the work would be to
> * ensure that the SDK side of job submission connects and looks up
> relevant transforms against the Expansion service if needed, and does the
> appropriate pipeline graph surgery.
>   *This may be something that's handled as some kind of hook and
> registration process for generally handling external transforms SDK side.
> * Have some kind of external transform to specify and configure on the Go
> side.
>
> Most of this can be ported from the Python implementation once it's
> stabilized.
>
> As with all my recommendations on how to do things with the Go SDK, feel
> free to ignore it and forge ahead. I look forward to someone tackling this,
> whenever it happens!
>
> Your friendly neighborhood distributed gopher wrangler,
> Robert Burke
>
> Related:
> PR 8531 [1] begins adding automates testing of the Go SDK against Flink,
> which should assist with ensuring this eventual work keeps working.
>
> [1]: https://github.com/apache/beam/pull/8531
>
> On Thu, May 9, 2019, 6:32 AM Łukasz Gajowy <lg...@apache.org> wrote:
>
>> Hi,
>>
>> part of our work that needs to be done to create tests for Core Apache
>> Beam operations is to enable both batch and streaming testing scenarios in
>> all SDKs (not only Java, so lot's of portability usage here). I gathered
>> some thoughts on how (I think) this could be done at the current state of
>> Beam:
>>
>> https://s.apache.org/portable-load-tests
>>
>> I added some questions at the end of the doc but I will paste them here
>> too for visibility:
>>
>>    - What is the status of Cross Language Support for Go SDK? Is it
>>    non-trivial to enable such support (if it doesn't exist yet)?
>>    - According to other contributors knowledge/experience: I noticed
>>    that streaming with KafkaIO is currently supported by wrapping the
>>    ExternalTransform in Python SDK. Do you think that streaming pipelines will
>>    "just work" with the current state of portability if I do the same for
>>    UnboundedSyntheticSource or is there something else missing?
>>
>> BTW: great to see Cross Language Support happening. Thanks for doing
>> this! :)
>>
>> Thanks,
>> Łukasz
>>
>>
>>
>>
>>

Re: Streaming pipelines in all SDKs!

Posted by Robert Burke <ro...@frantil.com>.
Currently the Go SDK doesn't have cross Language support implemented. My
recommendation is that we make sure the protocol is stable and implemented
on the Python SDK side before starting the Go SDK side, since that work is
already in progress.

 The relevant state of the Go SDK:
* beam.External exists, for specifying go transforms. (See the Go SDK's
PubSubIO for an example)
* the generated go code for the protos, including the Expansions service
API was refreshed last week. Meaning the work isn't blocked on that.

In principle, the work would be to
* ensure that the SDK side of job submission connects and looks up relevant
transforms against the Expansion service if needed, and does the
appropriate pipeline graph surgery.
  *This may be something that's handled as some kind of hook and
registration process for generally handling external transforms SDK side.
* Have some kind of external transform to specify and configure on the Go
side.

Most of this can be ported from the Python implementation once it's
stabilized.

As with all my recommendations on how to do things with the Go SDK, feel
free to ignore it and forge ahead. I look forward to someone tackling this,
whenever it happens!

Your friendly neighborhood distributed gopher wrangler,
Robert Burke

Related:
PR 8531 [1] begins adding automates testing of the Go SDK against Flink,
which should assist with ensuring this eventual work keeps working.

[1]: https://github.com/apache/beam/pull/8531

On Thu, May 9, 2019, 6:32 AM Łukasz Gajowy <lg...@apache.org> wrote:

> Hi,
>
> part of our work that needs to be done to create tests for Core Apache
> Beam operations is to enable both batch and streaming testing scenarios in
> all SDKs (not only Java, so lot's of portability usage here). I gathered
> some thoughts on how (I think) this could be done at the current state of
> Beam:
>
> https://s.apache.org/portable-load-tests
>
> I added some questions at the end of the doc but I will paste them here
> too for visibility:
>
>    - What is the status of Cross Language Support for Go SDK? Is it
>    non-trivial to enable such support (if it doesn't exist yet)?
>    - According to other contributors knowledge/experience: I noticed that
>    streaming with KafkaIO is currently supported by wrapping the
>    ExternalTransform in Python SDK. Do you think that streaming pipelines will
>    "just work" with the current state of portability if I do the same for
>    UnboundedSyntheticSource or is there something else missing?
>
> BTW: great to see Cross Language Support happening. Thanks for doing this!
> :)
>
> Thanks,
> Łukasz
>
>
>
>
>