You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Damon Douglas <do...@gmail.com> on 2020/03/03 15:00:21 UTC

Golang Apache Beam volunteer to help

Hello Robert,

I enjoyed meeting some of your colleagues last night at the Seattle Apache
Beam Meetup at the new Google building.  I found your contact information
via https://beam.apache.org/roadmap/go-sdk/.  It suggested I contact you to
see where I could possibly volunteer to help with the Apache Beam Go SDK
project.  I was wondering if I may chat with you for just a few minutes
over Google Hangout/Meet to see where I could get started.

A bit about me is I live in Seattle and currently work for Dito, a Google
Cloud partner engineering data pipelines using Dataflow and kubernetes.
Prior to joining Dito, I worked as a pharmacist for 10 years before
transitioning into software development and now pursuing my computer
science degree.

I hope you and your family stay safe and healthy.

Best,

Damon
(201) 888-3702

Re: Golang Apache Beam volunteer to help

Posted by Robert Burke <re...@google.com>.
Nice to meet you, Damon!

Had I not been on a plane last night I'd have been at the event, so I'm
sorry I missed you.

I'm writing to let you know that I'm interested in chatting, but that I'm
currently tying up loose ends before going on vacation without internet for
a week (departing on March 5th, returning on March 16th).

What I can say is that the roadmap is a bit out of date, and many of those
things have made progress or are completed. I intend to update that by the
end of March.

Learning what you'd like to help with (or would be useful for you) would be
a great starting point to narrow down something. Batch Beam features?
Streaming Beam features? General bugfixing and improvements?
Infrastructure? Go ecosystem changes? Anything at all?

One place to start that is streaming focused, but also makes you more
familiar with the execution stack would be to work on Windowing.

The SDK nominally supports it
<https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam#WindowInto>, and
we have a single example
<https://github.com/apache/beam/blob/master/sdks/go/examples/windowed_wordcount/windowed_wordcount.go>,
but no meaningful testing around it. Having a unit test that validates
windowing is passed through the execution stack
<https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/window.go>
properly
would go a long way to resolving this. You can see a bit of what I mean
with the pardo test
<https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/pardo_test.go#L50>
in
that package. From there, adding Session Windowing
<https://issues.apache.org/jira/browse/BEAM-4152> would show you how the
SDK propagates configuration from the user SDK side, through to the runner,
and into the execution stack. There was a PR
<https://github.com/apache/beam/pull/8111> that tried to do this last year,
but the contributor dropped away.

A more infrastructural change would be to provide clear and concise
instructions to running Go SDK pipelines on the Python Portable runner.
From there it's possible to extend the existing portable test suites (on
Flink, Spark, and Dataflow) to also run against that runner. This is
valuable as a small, local runner that has the correct semantics. Making it
so it's easy for Go users to run against that runner would help them test
their pipelines without needing to spend money on a cluster of machines, or
requiring Java.

A good stepping stone from that would be to fix/expand the semantics of the
Go Direct runner which has issues and bugs, since it doesn't materialize
data. Many bugs would be caught if this were resolved, and would avoid Go
users from requiring the Python runner. One could be ambitious and make a
purely Go portable runner in the style of the Python Portable runner, but I
definitely recommend starting with using the Python runner, then work on
improving the Go runner, to validate the semantics are correct.

Let me know what's interesting to you!  Right now I'm working on PCollection
and DoFn metrics <https://github.com/apache/beam/pull/10942> in the
execution side, and Daniel Oliviera is working on Splittable DoFn support
<https://github.com/apache/beam/pull/10991>. Once I'm done with that, I'll
be working on integrating Beam Schemas into the Go SDK, and having
additional perspectives during review would be valuable.

Let me know what you think!
Robert Burke

On Tue, Mar 3, 2020 at 7:00 AM Damon Douglas <do...@gmail.com>
wrote:

> Hello Robert,
>
> I enjoyed meeting some of your colleagues last night at the Seattle Apache
> Beam Meetup at the new Google building.  I found your contact information
> via https://beam.apache.org/roadmap/go-sdk/.  It suggested I contact you
> to see where I could possibly volunteer to help with the Apache Beam Go SDK
> project.  I was wondering if I may chat with you for just a few minutes
> over Google Hangout/Meet to see where I could get started.
>
> A bit about me is I live in Seattle and currently work for Dito, a Google
> Cloud partner engineering data pipelines using Dataflow and kubernetes.
> Prior to joining Dito, I worked as a pharmacist for 10 years before
> transitioning into software development and now pursuing my computer
> science degree.
>
> I hope you and your family stay safe and healthy.
>
> Best,
>
> Damon
> (201) 888-3702
>