You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Robert Burke <lo...@apache.org> on 2021/06/11 00:04:09 UTC

[Proposal] Go SDK Exits Experimental

Hello Beam Community!

I propose we stop calling the Apache Beam Go SDK experimental.

This thread is to discuss it as a community, and any conditions that remain
that would prevent the exit.

*tl;dr;*
*Ask Questions for answers and links! I have both.*
This entails including it officially in the Release process, removing the
various "experimental" text throughout the repo etc,
and otherwise treating it like Python and Java. Some Go specific tasks
around dep versioning.

The Go SDK implements the beam model efficiently for most batch tasks,
including basic windowing.
Apache Beam Go jobs can execute, and are tested on all Portable runners.
The core APIs are not going to change in incompatible ways going forward.
Scalable transforms can be written through SplittableDoFns or via Cross
Language transforms.

The SDK isn't 100% feature complete, but keeping it experimental doesn't
help with that any further.
Communities grow through contributions and use, and experimental markers
dissuade users.
There's plenty to do in order expand what can be done with the SDK.
(Contributions welcome)

*Why Exit Experimental now?*

Typically when we call an SDK or API Experimental, it's because there's a
risk that API or behaviors may change significantly.
This in turn, leads to additional work for users of the SDK on every
release which leads to sticking to older versions or forking
to preserve behavior. Version updates should be looked forward to, and
viewed as having little risk. Further while there's been
previous dicussion about what the "low bar" is for a new SDK, it hasn't
been summarily applied to the Go SDK. I feel this has
hurt development and contribution of new SDK languages (inherent difficulty
of SDK development notwithstanding).

When the SDK was designed, it wasn't entirely clear what the Beam Model
should look like in an opinionated language like Go.
Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0])
goes into detail what it means for a language without
Generics, or overloading, or inheritance to implement the beam model. One
could largely throw away static types (like Python),
but this approach rings hollow for Go. It would not do if the approach
couldn't grow and scale to the Beam Model. It's also hard
to tell if an API is any good before there are users.

Further, in the early days of Portability, there wasn't a way to write
scalable DoFns, dynamically or otherwise. It's an incredible
bottleneck to need to do all initial fanout of work on a single machine,
write everything to a Reshuffle, just in order to scale up.
Without being able to scale, Beam is little more than overhead.

At this point, both of these needs are met within the Go SDK for open
source.

*Background*

The Go SDK has been a part of the beam repo for a few years now, since it
was accidentally merged into master.
Since then it's been called experimental, and not officially part of the
releases.

Of the SDKs, it's was always designed around Beam Portability first. It
never had any "Legacy" (SDK x Runner specific ) workers.
It's always used the Beam Pipeline protos and FnAPI to execute jobs, first
with some very experimental code on Dataflow, but now
on all portable supported runners, like Flink, Spark, the Python Portable
runner, and Dataflow.

*API Stability*

The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline
construction since it was first merged in, and there are no
changes to that on the horizon that can't be made in a backwards compatible
manner. Largely these are related to New Features, or
usability improvements enabled by the advent of Go Generics (think of
"real" KV, emitter, and iterator types).

It's an open secret that the Go SDK has largely been under work for use
within Google. It's use is called FlumeGo, representing
the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline
processing engine. Thus most of the focus on improving
batch execution. FlumeGo sees ample use today, and there hasn't been a call
for fundamental changes to the API for ergonomic or
usability concerns.

*Scalability*

Google could get away without the Go SDK having an SDK side scalability
solution as a result of it's integration with Flume.
However, those days are now past.

The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which
supports writing scalable batch transforms natively
in the Go SDK.
The SDK also supports Cross Language Transforms, with Beam Schema
encodings. With it, production hardened transforms
from Java and Python are a wrapper away.

Presently, Daniel Oliveira (who implemented the SDF side work, and
completed the Xlang work,) is adding a wrapper for the
Java Kafka IO using Cross Language Transforms, which is often been
requested. This will also enable use of the Beam SQL
transforms that java enables.

*Features*

The Go SDK implements the Beam C=core. The Go SDK implements standard
coders, allows for user DoFns, and CombineFns and access
to core transforms like Flatten, GroupByKey, and features like Side Inputs,
Windowing, and User Metrics.
Basic windowing will be fully supported for batch even through lifted
combines in the 2.32.0 release.

All of the above enables Beam Go to be versatile for batch execution on
portable runners, and for simple streaming pipelines.

*Repo Testing*

On precommit the Go SDK runs all it's unit tests. On top of that, it runs
all it's integration tests against the Python Portable runner,
making it quick and robust to detect breaking changes without overspending
community resources. Those same tests are also
run against Dataflow, Flink, and Spark.

The tests are executable against all runners via the appropriate Go
commands (if you've stood up your own job management server),
or Gradle commands (which will spin up runner instances for you).
Documentation for executing tests and adding new ones
is on the wiki. [2] They are accessible to Go developers as they're
implemented with the standard Go testing tools.

*Shortcomings*
That said, there's still much to do. Let me briefly tell you what doesn't
work, and it's up to you to weigh whether they block
being out of experimental.

At present, only a textio has been implemented as Splittable DoFn.
Once the Kafka wrapper is merged in, it will serve as a the first example
for future contributions for
new transform wrappers for the Go SDK.
Transforms and IOs are lacking, but at this point users are empowered to
write their own DoFns or wrap existing transforms for Cross Language use.

In the core SDK, more streaming focused features have yet to be
implemented, but they're largely additions to what exists already
rather than total rebuilds. Much of the work is definining how a user
specifies their desires, and turning those into the appropriate
FnAPI requests at execution time. Back in October I wrote at length on the
wiki [1] what's missing for additional streaming features.

While we have bolstered our testing recently, there's likely still more we
could test to improve our confidence in the SDK,
in particular regarding the included transforms libraries and examples.

*Moving Forward*

My immediate plan is to work on incorporating the Go SDK fully into the
Beam Programming Guide. I've audited the guide [3], and
am beginning to add missing content and filling in the Go specific gaps.
This will be tied to improving the Go Doc with more Go
specific user documentation that isn't appropriate for the BPG.
And resolving the LICENSE issue around the public display of that GoDoc.

If this proposal is accepted by a binding vote, I will incorporate the SDK
into the release process, and remove the "experimental"
language around the SDK. This largely entails updating the release scripts
to also build and publish the Go SDK Docker containers.
As for releasing the code, we're technically already doing so whenever we
tag a release branch [4].

The clearest signal to the Go community however will be migrating the SDK
to use Go Modules for dependency version control,
which Daniel is planning on working on after his Kafka task. This will put
our repo infrastructure, SDK contributors, and users
on the same footing when it comes to dependency management. It will remove
the "+incompatible" tags one sees on the
pkg.go.dev list at [4].

I'm very happy to answer any questions you might have about the SDK, and
provide additional links as needed. I intentionally avoided
a link barrage in this email, as they can distract from the point: The SDK
is ready for folks to use it, we need to tell them that they can
rather than they shouldn't.

Robert Burke
Defacto Beam Go TL

[0] https://s.apache.org/beam-go-sdk-design-rfc
[1]
https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
[2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
[3]
https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
(SDK Audit sheet)
[4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions

Re: [Proposal] Go SDK Exits Experimental

Posted by Ahmet Altay <al...@google.com>.
On Wed, Nov 24, 2021 at 10:27 AM Sachin Agarwal <sa...@google.com> wrote:

> We have also started work with the Google Golang devrel team to work on
> messaging once the batteries are “included”.  They’re super excited!
>

Thank you!


>
> On Wed, Nov 24, 2021 at 10:24 AM Ahmet Altay <al...@google.com> wrote:
>
>> Great. I retweeted that on the official account.
>>
>> On Wed, Nov 24, 2021 at 10:14 AM Robert Burke <ro...@frantil.com> wrote:
>>
>>> I think so.
>>>
>>> My tweet [1] on the topic got a bit of traction even without the
>>> official beam account boosting it.
>>>
>>> [1]
>>> https://twitter.com/lostluck/status/1456720240092467200?t=owVJd6ZuTVMUkNyvYNr4Xg&s=19
>>>
>>> On Wed, Nov 24, 2021, 10:11 AM Ahmet Altay <al...@google.com> wrote:
>>>
>>>> Thank you Rebo, and congratulations to everyone working on Go SDK :)
>>>>
>>>> @Robert Burke <re...@google.com> @Brittany Hermann <he...@google.com>
>>>>  @Sachin Agarwal <sa...@google.com> - Should we share this on
>>>> Beam's twitter and other social media pages?
>>>>
>>>> On Fri, Nov 5, 2021 at 1:29 PM Robert Burke <ro...@frantil.com> wrote:
>>>>
>>>>> It's my great pleasure to announce that the Apache Beam Go SDK is no
>>>>> longer experimental. https://beam.apache.org/blog/go-sdk-release/
>>>>>
>>>>> Thank you everyone.
>>>>> Robert Burke
>>>>> Beam Go Busybody
>>>>>
>>>>> On Thu, Nov 4, 2021, 6:29 PM Robert Burke <ro...@frantil.com> wrote:
>>>>>
>>>>>> At this point I just need an LGTM on the blog post PR, as the draft
>>>>>> is finalized.
>>>>>>
>>>>>> Udi added the sdks/v2.33.0 tag which works as expected. I've also
>>>>>> verified that the appropriate container is used by default when not
>>>>>> specified which is the last unknown in this process.
>>>>>>
>>>>>> Who's ready to release a new SDK? I am!
>>>>>>
>>>>>>  https://github.com/apache/beam/pull/15894 (or join the exciting
>>>>>> reaction emoji on the top post).
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:
>>>>>>
>>>>>>> The current draft of the exit blog post is
>>>>>>> https://github.com/apache/beam/pull/15894
>>>>>>> Comments are very welcome. I'm going to continue looking for Known
>>>>>>> issues (which will be linked to their respective JIRAs) tomorrow.
>>>>>>>
>>>>>>> Since RC1 is getting cycled, I can also go back to the original plan
>>>>>>> of v2.33.0, if we'd like to get it out this week.
>>>>>>>
>>>>>>>
>>>>>>> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Investigation yielded that there's no way around the prefixed tags.
>>>>>>>> The JIRA has been commented with the explanation.
>>>>>>>>
>>>>>>>> https://github.com/apache/beam/pull/15881 has the release script
>>>>>>>> updates.
>>>>>>>>
>>>>>>>> I'm working on the Exit blogpost and the updated Go SDK roadmap.
>>>>>>>> The draft PR will be linked here.
>>>>>>>>
>>>>>>>> Since 2.34.0 is almost out (assuming RC1 verification goes well)
>>>>>>>> I'm inclined to wait for that release to finish before publishing the
>>>>>>>> blogpost. I'll link the draft PR here as soon as it's ready.
>>>>>>>>
>>>>>>>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also
>>>>>>>> prefix tagged so there isn't a gap in versions between the unmoduled code
>>>>>>>> and moduled code.
>>>>>>>>
>>>>>>>> Once published,  that'll be the end of this thread.
>>>>>>>>
>>>>>>>> Thank you very much everyone.
>>>>>>>>
>>>>>>>> Robert Burke
>>>>>>>> Beam Go Busybody
>>>>>>>>
>>>>>>>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> +1 to extra tags. They'll be trivial to add to our release
>>>>>>>>> process, and git tags are lightweight by design so I don't foresee any
>>>>>>>>> problems.
>>>>>>>>>
>>>>>>>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <
>>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>>
>>>>>>>>>> Glad you were able to figure it out. The extra tags are certainly
>>>>>>>>>> worth making this work if it's what we have to do, and shouldn't
>>>>>>>>>> be
>>>>>>>>>> too much of a problem (until, hopefully, it's fixed on the go
>>>>>>>>>> side).
>>>>>>>>>>
>>>>>>>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> >
>>>>>>>>>> > With Kyle's help with the additional tagging of the next RC, we
>>>>>>>>>> have validated that this is the currently correct approach.
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>>>>>>>> >
>>>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>>>>>>>> >
>>>>>>>>>> > Or even:
>>>>>>>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>>>>>>>> (links to latest tagged version)
>>>>>>>>>> >
>>>>>>>>>> > The main cost to this approach is doubling the number of tags
>>>>>>>>>> in the tags list: https://github.com/apache/beam/tags which is
>>>>>>>>>> not ideal, but overall a small cost. There's no need for "full publish" of
>>>>>>>>>> these additional tags, so we won't be doubling our "releases" (see
>>>>>>>>>> https://github.com/apache/beam/releases).
>>>>>>>>>> >
>>>>>>>>>> > I'll still be filing a bug against the Go commands since the
>>>>>>>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>>>>>>>> so, we can always delete the tags from the affected branches, and cease the
>>>>>>>>>> behavior going forward. I'll search through the existing Go issues first
>>>>>>>>>> however to see if this has been previously discussed, and report my
>>>>>>>>>> findings here either way.
>>>>>>>>>> >
>>>>>>>>>> > This does require 2 small changes to release guide: The rc
>>>>>>>>>> tagging script, and the finally tagging:
>>>>>>>>>> >
>>>>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>>>>>>>> >
>>>>>>>>>> > I'll make this change later this week (or early next) assuming
>>>>>>>>>> there are no objections.
>>>>>>>>>> >
>>>>>>>>>> > Thank you all very much for your patience,
>>>>>>>>>> > Robert Burke
>>>>>>>>>> > Beam Go Busybody
>>>>>>>>>> >
>>>>>>>>>> >
>>>>>>>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> > > With much research in reading the Go Modules documentation, I
>>>>>>>>>> have confirmed what the issue is.
>>>>>>>>>> > >
>>>>>>>>>> > > We added the go.mod file to sdks/ under the repo root because
>>>>>>>>>> it's a cleaner spot for the change, captures the Java and Python container
>>>>>>>>>> boot code (written in Go) into the module and avoids conflicts in
>>>>>>>>>> interpretations of the vendor directory that lives at the root level.
>>>>>>>>>> > >
>>>>>>>>>> > > However, we missed that when doing so, the standard version
>>>>>>>>>> tags would only apply to modules at the root level, not at modules in
>>>>>>>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>>>>>>>> quoting the important paragraph:
>>>>>>>>>> > >
>>>>>>>>>> > > > If a module is defined in a subdirectory within the
>>>>>>>>>> repository, that is, the module subdirectory portion of
>>>>>>>>>> > > > the module path is not empty, then each tag name must be
>>>>>>>>>> prefixed with the module subdirectory,
>>>>>>>>>> > > > followed by a slash. For example, the module
>>>>>>>>>> golang.org/x/tools/gopls is defined in the gopls
>>>>>>>>>> > > > subdirectory of the repository with root path
>>>>>>>>>> golang.org/x/tools. The version v0.4.0 of that module must >
>>>>>>>>>> have the tag named gopls/v0.4.0 in that repository.
>>>>>>>>>> > >
>>>>>>>>>> > > Specifically, for the Go SDK to be able to be fetched at the
>>>>>>>>>> right version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>>>>>>>> "sdks/v2.34.0-RC1"
>>>>>>>>>> > >
>>>>>>>>>> > > So, the fix for the Go versioning issue is to amend our
>>>>>>>>>> Release process (including generating Release Candidate builds) to also add
>>>>>>>>>> a prefixed version tag with the same version.
>>>>>>>>>> > >
>>>>>>>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if
>>>>>>>>>> there are no objections we can back update the 2.33.0 release branch with
>>>>>>>>>> such a prefixed tag. At which point I can also write the Official
>>>>>>>>>> Experiemental Exit Blog post.
>>>>>>>>>> > >
>>>>>>>>>> > > Thank you all for your patience.
>>>>>>>>>> > > Robert Burke
>>>>>>>>>> > >
>>>>>>>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>>>>>>>> > > > Thank you for the detailed update! Let us know if we can
>>>>>>>>>> help.
>>>>>>>>>> > > >
>>>>>>>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <
>>>>>>>>>> lostluck@apache.org> wrote:
>>>>>>>>>> > > >
>>>>>>>>>> > > > > This is a status update.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > At this point 2.33.0 is released, but there are
>>>>>>>>>> difficulties with
>>>>>>>>>> > > > > accessing the tagged versions using the standard go
>>>>>>>>>> tools. It's currently
>>>>>>>>>> > > > > under investigation.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Using the v2 path in a go program then running `go mod
>>>>>>>>>> tidy` will populate
>>>>>>>>>> > > > > the file with  a pseudo-version rather than the latest
>>>>>>>>>> tag (v2.33.0)  (eg
>>>>>>>>>> > > > > the line looks like
>>>>>>>>>> > > > > require github.com/apache/beam/sdks/v2
>>>>>>>>>> v2.0.0-20211013181004-a9120e083008
>>>>>>>>>> > > > > )
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > While this will work, it's not the desired experience for
>>>>>>>>>> users at this
>>>>>>>>>> > > > > point. Current downside is that the releases are not
>>>>>>>>>> meaningful targets for
>>>>>>>>>> > > > > some reason. However, we retain the other benefits of Go
>>>>>>>>>> Modules (actual
>>>>>>>>>> > > > > dependency versioning, management by go tools).
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > The issue is some combination of the go tooling [A] ,
>>>>>>>>>> that we added a go
>>>>>>>>>> > > > > mod file outside of the repo root [B], and that we did
>>>>>>>>>> not increment the
>>>>>>>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > [B] From the go documentation, this should be legal and
>>>>>>>>>> fine, even if it's
>>>>>>>>>> > > > > not recommended. This is fortunate because the root of
>>>>>>>>>> the repo would have
>>>>>>>>>> > > > > played poorly with root vendor directory, which the go
>>>>>>>>>> tools have opinions
>>>>>>>>>> > > > > on.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > [C] Incrementing the major version is recommended,in the
>>>>>>>>>> Go Modules
>>>>>>>>>> > > > > documentation, when transitioning to Go Modules. However,
>>>>>>>>>> it never said it
>>>>>>>>>> > > > > was required, nor did it indicate this current failure
>>>>>>>>>> mode. If anything
>>>>>>>>>> > > > > this should be documented in those docs, if it's not
>>>>>>>>>> another bug. We would
>>>>>>>>>> > > > > not necessarily want to declare a global v3 for beam at
>>>>>>>>>> this time, for just
>>>>>>>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>>>>>>>> Notionally there are
>>>>>>>>>> > > > > some larger breaking changes the Java and Python SDKs
>>>>>>>>>> would want to make in
>>>>>>>>>> > > > > such an event, and thus it's a larger conversation, that
>>>>>>>>>> is out of scope at
>>>>>>>>>> > > > > this time.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > This leaves [A] where some mis-understanding of the
>>>>>>>>>> documented semantics
>>>>>>>>>> > > > > occurred. I certainly expected the tagged version of the
>>>>>>>>>> non-root go-module
>>>>>>>>>> > > > > to be inherited from the parent, not wholesale ignored.
>>>>>>>>>> As a result, I'll
>>>>>>>>>> > > > > be filing a bug against the go tools to determine this,
>>>>>>>>>> and see what paths
>>>>>>>>>> > > > > forward exist.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > It's my hope to resolve this before we write a properly
>>>>>>>>>> Experimental Exit
>>>>>>>>>> > > > > blog post for the Go SDK.
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > Thank you for your patience, and time.
>>>>>>>>>> > > > > Robert Burke
>>>>>>>>>> > > > > Beam Go Busybody
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>>>>>>>> wrote:
>>>>>>>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the
>>>>>>>>>> SDK now uses Go
>>>>>>>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>>>>>>>> contributions. [2]
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > The Module file lives in the sdks/ directory so there's
>>>>>>>>>> a single Go
>>>>>>>>>> > > > > Module for the whole SDK, tests, examples, and any
>>>>>>>>>> support code for the
>>>>>>>>>> > > > > container boot builds. This excludes the Go SDK Code
>>>>>>>>>> katas [3] go modules
>>>>>>>>>> > > > > which can be updated once 2.33.0 has been released.
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > PR 15365 [4] adds the SDK containers back to the
>>>>>>>>>> release builds, and
>>>>>>>>>> > > > > default uses the release specific container for docker
>>>>>>>>>> execution jobs. For
>>>>>>>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>>>>>>>> validation will
>>>>>>>>>> > > > > need to explictly specify RC versions of containers.
>>>>>>>>>> However, given that
>>>>>>>>>> > > > > the Go SDK container and worker boot process rarely
>>>>>>>>>> changes, this is
>>>>>>>>>> > > > > unlikely to be an issue.
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > At present I'm cleaning up some of the references to
>>>>>>>>>> experimental, and
>>>>>>>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>>>>>>>> release (even
>>>>>>>>>> > > > > though that's 4-6 weeks out from actual release.)
>>>>>>>>>> CHANGES.md  will be
>>>>>>>>>> > > > > updated to note the event, but a larger blogpost will
>>>>>>>>>> happen after the
>>>>>>>>>> > > > > release goes public.
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > Cheers,
>>>>>>>>>> > > > > > Robert Burke
>>>>>>>>>> > > > > > Defacto Beam Go TL.
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > [1]
>>>>>>>>>> > > > >
>>>>>>>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>>>>>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>>>>>>>> > > > > > [3]
>>>>>>>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>>>>>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com>
>>>>>>>>>> wrote:
>>>>>>>>>> > > > > > > +1, congratulations & thank you!
>>>>>>>>>> > > > > > >
>>>>>>>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>>>>>>>> lostluck@apache.org>
>>>>>>>>>> > > > > wrote:
>>>>>>>>>> > > > > > >
>>>>>>>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>>>>>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which
>>>>>>>>>> goes up to section
>>>>>>>>>> > > > > ~4.3.
>>>>>>>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>>>>>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <
>>>>>>>>>> robert@frantil.com> wrote:
>>>>>>>>>> > > > > > > > > Yup!
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > > My immediate plan is to work on incorporating the
>>>>>>>>>> Go SDK fully
>>>>>>>>>> > > > > into the
>>>>>>>>>> > > > > > > > > Beam Programming Guide. I've audited the guide,
>>>>>>>>>> and
>>>>>>>>>> > > > > > > > > am beginning to add missing content and filling
>>>>>>>>>> in the Go specific
>>>>>>>>>> > > > > gaps.
>>>>>>>>>> > > > > > > > > This will be tied to improving the Go Doc with
>>>>>>>>>> more Go
>>>>>>>>>> > > > > > > > > specific user documentation that isn't
>>>>>>>>>> appropriate for the BPG.
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > > My audit of the guide is here:
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>>>>>>>> feature page
>>>>>>>>>> > > > > looks
>>>>>>>>>> > > > > > > > worse
>>>>>>>>>> > > > > > > > > than it is, as it was more productive to focus on
>>>>>>>>>> what isn't
>>>>>>>>>> > > > > available
>>>>>>>>>> > > > > > > > than
>>>>>>>>>> > > > > > > > > what is. That's a snapshot of my actual working
>>>>>>>>>> sheet but I'll be
>>>>>>>>>> > > > > > > > updating
>>>>>>>>>> > > > > > > > > it as needed.
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>>>>>>>> iemejia@gmail.com>
>>>>>>>>>> > > > > wrote:
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > > > Oups forgot to write one question. Will this
>>>>>>>>>> come with revamped
>>>>>>>>>> > > > > > > > > > website instructions/doc for golang too?
>>>>>>>>>> > > > > > > > > >
>>>>>>>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>>>>>>>> iemejia@gmail.com>
>>>>>>>>>> > > > > > > > wrote:
>>>>>>>>>> > > > > > > > > > >
>>>>>>>>>> > > > > > > > > > > Huge +1
>>>>>>>>>> > > > > > > > > > >
>>>>>>>>>> > > > > > > > > > > This is definitely something many people have
>>>>>>>>>> asked about, so
>>>>>>>>>> > > > > it is
>>>>>>>>>> > > > > > > > > > > great to see it finally happening.
>>>>>>>>>> > > > > > > > > > >
>>>>>>>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth
>>>>>>>>>> Knowles <
>>>>>>>>>> > > > > kenn@apache.org>
>>>>>>>>>> > > > > > > > wrote:
>>>>>>>>>> > > > > > > > > > > >
>>>>>>>>>> > > > > > > > > > > > +1 awesome
>>>>>>>>>> > > > > > > > > > > >
>>>>>>>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert
>>>>>>>>>> Burke <
>>>>>>>>>> > > > > lostluck@apache.org
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > > > > wrote:
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll
>>>>>>>>>> aim to get those (Go
>>>>>>>>>> > > > > > > > modules
>>>>>>>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut,
>>>>>>>>>> and certainly
>>>>>>>>>> > > > > before the
>>>>>>>>>> > > > > > > > 2.33
>>>>>>>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>>>>>>>> process.
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in
>>>>>>>>>> the future, we may
>>>>>>>>>> > > > > want a
>>>>>>>>>> > > > > > > > > > harder break between a newer Generic first API
>>>>>>>>>> and and the
>>>>>>>>>> > > > > current
>>>>>>>>>> > > > > > > > version,
>>>>>>>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in
>>>>>>>>>> Go aren't
>>>>>>>>>> > > > > identical to
>>>>>>>>>> > > > > > > > the
>>>>>>>>>> > > > > > > > > > feature referred to by that term in Java, C++,
>>>>>>>>>> Rust, etc, so
>>>>>>>>>> > > > > it'll
>>>>>>>>>> > > > > > > > take a
>>>>>>>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> However, by the current nature of Go, we
>>>>>>>>>> had to have pretty
>>>>>>>>>> > > > > > > > > > sophisticated reflective analysis to handle
>>>>>>>>>> DoFns and map them
>>>>>>>>>> > > > > to their
>>>>>>>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>>>>>>>> emitter, and
>>>>>>>>>> > > > > Iterator
>>>>>>>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go
>>>>>>>>>> SDK internals to
>>>>>>>>>> > > > > use
>>>>>>>>>> > > > > > > > > > generics (like the implementation of Stats
>>>>>>>>>> DoFns like Min, Max,
>>>>>>>>>> > > > > etc)
>>>>>>>>>> > > > > > > > would
>>>>>>>>>> > > > > > > > > > also be able to be made transparently to most
>>>>>>>>>> users, and
>>>>>>>>>> > > > > certainly any
>>>>>>>>>> > > > > > > > of
>>>>>>>>>> > > > > > > > > > the framework for execution time handling (the
>>>>>>>>>> "worker's SDK
>>>>>>>>>> > > > > harness")
>>>>>>>>>> > > > > > > > > > would be able to be cleaned up if need be.
>>>>>>>>>> Finally, adding more
>>>>>>>>>> > > > > > > > > > sophisticated DoFn registration and code
>>>>>>>>>> generation would be
>>>>>>>>>> > > > > able to
>>>>>>>>>> > > > > > > > > > replace the optional code generator entirely,
>>>>>>>>>> saving some users
>>>>>>>>>> > > > > a `go
>>>>>>>>>> > > > > > > > > > generate` step, simplifying getting improved
>>>>>>>>>> execution
>>>>>>>>>> > > > > performance.
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> Changing things like making a Type
>>>>>>>>>> Parameterized
>>>>>>>>>> > > > > PCollection,
>>>>>>>>>> > > > > > > > would
>>>>>>>>>> > > > > > > > > > be far more involved, as would trying to use
>>>>>>>>>> some kind of Apply
>>>>>>>>>> > > > > > > > format. The
>>>>>>>>>> > > > > > > > > > lack of Method Overrides prevents the apply
>>>>>>>>>> chaining approach.
>>>>>>>>>> > > > > Or at
>>>>>>>>>> > > > > > > > least
>>>>>>>>>> > > > > > > > > > prevents it from working simply.
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> Finally, Go Generics won't be available
>>>>>>>>>> until Go 1.18,
>>>>>>>>>> > > > > which isn't
>>>>>>>>>> > > > > > > > > > until next year. See
>>>>>>>>>> https://blog.golang.org/generics-proposal
>>>>>>>>>> > > > > for
>>>>>>>>>> > > > > > > > > > details.
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17
>>>>>>>>>> does include a
>>>>>>>>>> > > > > Register
>>>>>>>>>> > > > > > > > > > calling convention, leading to a modest
>>>>>>>>>> performance improvement
>>>>>>>>>> > > > > across
>>>>>>>>>> > > > > > > > the
>>>>>>>>>> > > > > > > > > > board.
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> Cheers,
>>>>>>>>>> > > > > > > > > > > >> Robert Burke
>>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>>>>>>>> > > > > robertwb@google.com>
>>>>>>>>>> > > > > > > > wrote:
>>>>>>>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>>>>>>>> experimental once
>>>>>>>>>> > > > > the Go
>>>>>>>>>> > > > > > > > > > Modules
>>>>>>>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK
>>>>>>>>>> needs to support
>>>>>>>>>> > > > > every
>>>>>>>>>> > > > > > > > > > feature
>>>>>>>>>> > > > > > > > > > > >> > to be accepted, especially now that we
>>>>>>>>>> can do
>>>>>>>>>> > > > > cross-language
>>>>>>>>>> > > > > > > > > > > >> > transforms, and Go definitely supports
>>>>>>>>>> enough to be quite
>>>>>>>>>> > > > > > > > useful.
>>>>>>>>>> > > > > > > > > > (WRT
>>>>>>>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>>>>>>>> supports the
>>>>>>>>>> > > > > streaming
>>>>>>>>>> > > > > > > > model
>>>>>>>>>> > > > > > > > > > > >> > with windows and timestamps, and runs
>>>>>>>>>> fine on a streaming
>>>>>>>>>> > > > > > > > runner,
>>>>>>>>>> > > > > > > > > > even
>>>>>>>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>>>>>>>> timers aren't yet
>>>>>>>>>> > > > > > > > > > available.)
>>>>>>>>>> > > > > > > > > > > >> >
>>>>>>>>>> > > > > > > > > > > >> > This is a great milestone.
>>>>>>>>>> > > > > > > > > > > >> >
>>>>>>>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>>>>>>>> Hamilton <
>>>>>>>>>> > > > > > > > tysonjh@google.com>
>>>>>>>>>> > > > > > > > > > wrote:
>>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>>>>>>>> status after Go
>>>>>>>>>> > > > > Modules
>>>>>>>>>> > > > > > > > > > are completed and the LICENSE issue is
>>>>>>>>>> resolved. I don't think
>>>>>>>>>> > > > > that
>>>>>>>>>> > > > > > > > lacking
>>>>>>>>>> > > > > > > > > > streaming support is a blocker. The other thing
>>>>>>>>>> I checked to see
>>>>>>>>>> > > > > was if
>>>>>>>>>> > > > > > > > > > there were metrics available on
>>>>>>>>>> metrics.beam.apache.org,
>>>>>>>>>> > > > > specifically
>>>>>>>>>> > > > > > > > for
>>>>>>>>>> > > > > > > > > > measuring code health via post-commit over
>>>>>>>>>> time, which there are
>>>>>>>>>> > > > > and
>>>>>>>>>> > > > > > > > the
>>>>>>>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one
>>>>>>>>>> thing that
>>>>>>>>>> > > > > surprised me
>>>>>>>>>> > > > > > > > from
>>>>>>>>>> > > > > > > > > > your summary is that when Go introduces
>>>>>>>>>> generics it won't result
>>>>>>>>>> > > > > in any
>>>>>>>>>> > > > > > > > > > backwards incompatible changes in Apache Beam.
>>>>>>>>>> That's great
>>>>>>>>>> > > > > news, but
>>>>>>>>>> > > > > > > > does
>>>>>>>>>> > > > > > > > > > it mean there will be a need to support both
>>>>>>>>>> non-generic and
>>>>>>>>>> > > > > generic
>>>>>>>>>> > > > > > > > APIs
>>>>>>>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>>>>>>>> introduced in the
>>>>>>>>>> > > > > Go
>>>>>>>>>> > > > > > > > 1.17
>>>>>>>>>> > > > > > > > > > release (optimistically) in August this year.
>>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert
>>>>>>>>>> Burke <
>>>>>>>>>> > > > > > > > lostluck@apache.org>
>>>>>>>>>> > > > > > > > > > wrote:
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache
>>>>>>>>>> Beam Go SDK
>>>>>>>>>> > > > > > > > experimental.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>>>>>>>> community, and any
>>>>>>>>>> > > > > > > > conditions
>>>>>>>>>> > > > > > > > > > that remain that would prevent the exit.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> tl;dr;
>>>>>>>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links!
>>>>>>>>>> I have both.
>>>>>>>>>> > > > > > > > > > > >> > >> This entails including it officially
>>>>>>>>>> in the Release
>>>>>>>>>> > > > > process,
>>>>>>>>>> > > > > > > > > > removing the various "experimental" text
>>>>>>>>>> throughout the repo etc,
>>>>>>>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python
>>>>>>>>>> and Java. Some Go
>>>>>>>>>> > > > > > > > specific
>>>>>>>>>> > > > > > > > > > tasks around dep versioning.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>>>>>>>> efficiently for
>>>>>>>>>> > > > > most
>>>>>>>>>> > > > > > > > batch
>>>>>>>>>> > > > > > > > > > tasks, including basic windowing.
>>>>>>>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and
>>>>>>>>>> are tested on all
>>>>>>>>>> > > > > > > > Portable
>>>>>>>>>> > > > > > > > > > runners.
>>>>>>>>>> > > > > > > > > > > >> > >> The core APIs are not going to change
>>>>>>>>>> in incompatible
>>>>>>>>>> > > > > ways
>>>>>>>>>> > > > > > > > going
>>>>>>>>>> > > > > > > > > > forward.
>>>>>>>>>> > > > > > > > > > > >> > >> Scalable transforms can be written
>>>>>>>>>> through
>>>>>>>>>> > > > > SplittableDoFns or
>>>>>>>>>> > > > > > > > > > via Cross Language transforms.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete,
>>>>>>>>>> but keeping it
>>>>>>>>>> > > > > > > > experimental
>>>>>>>>>> > > > > > > > > > doesn't help with that any further.
>>>>>>>>>> > > > > > > > > > > >> > >> Communities grow through
>>>>>>>>>> contributions and use, and
>>>>>>>>>> > > > > > > > experimental
>>>>>>>>>> > > > > > > > > > markers dissuade users.
>>>>>>>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand
>>>>>>>>>> what can be done
>>>>>>>>>> > > > > with
>>>>>>>>>> > > > > > > > the
>>>>>>>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>>>>>>>> Experimental, it's
>>>>>>>>>> > > > > > > > because
>>>>>>>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>>>>>>>> significantly.
>>>>>>>>>> > > > > > > > > > > >> > >> This in turn, leads to additional
>>>>>>>>>> work for users of
>>>>>>>>>> > > > > the SDK
>>>>>>>>>> > > > > > > > on
>>>>>>>>>> > > > > > > > > > every release which leads to sticking to older
>>>>>>>>>> versions or
>>>>>>>>>> > > > > forking
>>>>>>>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>>>>>>>> should be looked
>>>>>>>>>> > > > > > > > forward
>>>>>>>>>> > > > > > > > > > to, and viewed as having little risk. Further
>>>>>>>>>> while there's been
>>>>>>>>>> > > > > > > > > > > >> > >> previous dicussion about what the
>>>>>>>>>> "low bar" is for a
>>>>>>>>>> > > > > new
>>>>>>>>>> > > > > > > > SDK, it
>>>>>>>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I
>>>>>>>>>> feel this has
>>>>>>>>>> > > > > > > > > > > >> > >> hurt development and contribution of
>>>>>>>>>> new SDK languages
>>>>>>>>>> > > > > > > > (inherent
>>>>>>>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>>>>>>>> entirely clear
>>>>>>>>>> > > > > what the
>>>>>>>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>>>>>>>> language like Go.
>>>>>>>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>>>>>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>>>>> [0]) goes into
>>>>>>>>>> > > > > detail
>>>>>>>>>> > > > > > > > what it
>>>>>>>>>> > > > > > > > > > means for a language without
>>>>>>>>>> > > > > > > > > > > >> > >> Generics, or overloading, or
>>>>>>>>>> inheritance to implement
>>>>>>>>>> > > > > the
>>>>>>>>>> > > > > > > > beam
>>>>>>>>>> > > > > > > > > > model. One could largely throw away static
>>>>>>>>>> types (like Python),
>>>>>>>>>> > > > > > > > > > > >> > >> but this approach rings hollow for
>>>>>>>>>> Go. It would not do
>>>>>>>>>> > > > > if the
>>>>>>>>>> > > > > > > > > > approach couldn't grow and scale to the Beam
>>>>>>>>>> Model. It's also
>>>>>>>>>> > > > > hard
>>>>>>>>>> > > > > > > > > > > >> > >> to tell if an API is any good before
>>>>>>>>>> there are users.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Further, in the early days of
>>>>>>>>>> Portability, there
>>>>>>>>>> > > > > wasn't a
>>>>>>>>>> > > > > > > > way to
>>>>>>>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise.
>>>>>>>>>> It's an
>>>>>>>>>> > > > > incredible
>>>>>>>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial
>>>>>>>>>> fanout of work on
>>>>>>>>>> > > > > a
>>>>>>>>>> > > > > > > > single
>>>>>>>>>> > > > > > > > > > machine, write everything to a Reshuffle, just
>>>>>>>>>> in order to scale
>>>>>>>>>> > > > > up.
>>>>>>>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is
>>>>>>>>>> little more than
>>>>>>>>>> > > > > > > > overhead.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> At this point, both of these needs
>>>>>>>>>> are met within the
>>>>>>>>>> > > > > Go SDK
>>>>>>>>>> > > > > > > > for
>>>>>>>>>> > > > > > > > > > open source.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Background
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the
>>>>>>>>>> beam repo for a few
>>>>>>>>>> > > > > years
>>>>>>>>>> > > > > > > > now,
>>>>>>>>>> > > > > > > > > > since it was accidentally merged into master.
>>>>>>>>>> > > > > > > > > > > >> > >> Since then it's been called
>>>>>>>>>> experimental, and not
>>>>>>>>>> > > > > officially
>>>>>>>>>> > > > > > > > > > part of the releases.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>>>>>>>> around Beam
>>>>>>>>>> > > > > Portability
>>>>>>>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>>>>>>>> specific )
>>>>>>>>>> > > > > workers.
>>>>>>>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline
>>>>>>>>>> protos and FnAPI to
>>>>>>>>>> > > > > > > > execute
>>>>>>>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>>>>>>>> Dataflow, but now
>>>>>>>>>> > > > > > > > > > > >> > >> on all portable supported runners,
>>>>>>>>>> like Flink, Spark,
>>>>>>>>>> > > > > the
>>>>>>>>>> > > > > > > > Python
>>>>>>>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> API Stability
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully
>>>>>>>>>> changed it's user API
>>>>>>>>>> > > > > for DoFn
>>>>>>>>>> > > > > > > > > > and pipeline construction since it was first
>>>>>>>>>> merged in, and
>>>>>>>>>> > > > > there are
>>>>>>>>>> > > > > > > > no
>>>>>>>>>> > > > > > > > > > > >> > >> changes to that on the horizon that
>>>>>>>>>> can't be made in a
>>>>>>>>>> > > > > > > > backwards
>>>>>>>>>> > > > > > > > > > compatible manner. Largely these are related to
>>>>>>>>>> New Features, or
>>>>>>>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>>>>>>>> advent of Go
>>>>>>>>>> > > > > Generics
>>>>>>>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator
>>>>>>>>>> types).
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK
>>>>>>>>>> has largely been
>>>>>>>>>> > > > > under
>>>>>>>>>> > > > > > > > work
>>>>>>>>>> > > > > > > > > > for use within Google. It's use is called
>>>>>>>>>> FlumeGo, representing
>>>>>>>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on
>>>>>>>>>> top of Flume,
>>>>>>>>>> > > > > Google's
>>>>>>>>>> > > > > > > > batch
>>>>>>>>>> > > > > > > > > > pipeline processing engine. Thus most of the
>>>>>>>>>> focus on improving
>>>>>>>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample
>>>>>>>>>> use today, and
>>>>>>>>>> > > > > there
>>>>>>>>>> > > > > > > > hasn't
>>>>>>>>>> > > > > > > > > > been a call for fundamental changes to the API
>>>>>>>>>> for ergonomic or
>>>>>>>>>> > > > > > > > > > > >> > >> usability concerns.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Scalability
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Google could get away without the Go
>>>>>>>>>> SDK having an SDK
>>>>>>>>>> > > > > side
>>>>>>>>>> > > > > > > > > > scalability solution as a result of it's
>>>>>>>>>> integration with Flume.
>>>>>>>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK now supports
>>>>>>>>>> SplittableDoFns along with
>>>>>>>>>> > > > > Dynamic
>>>>>>>>>> > > > > > > > > > Splitting, which supports writing scalable
>>>>>>>>>> batch transforms
>>>>>>>>>> > > > > natively
>>>>>>>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>>>>>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>>>>>>>> Transforms, with
>>>>>>>>>> > > > > Beam
>>>>>>>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>>>>>>>> transforms
>>>>>>>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper
>>>>>>>>>> away.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who
>>>>>>>>>> implemented the SDF
>>>>>>>>>> > > > > side
>>>>>>>>>> > > > > > > > work,
>>>>>>>>>> > > > > > > > > > and completed the Xlang work,) is adding a
>>>>>>>>>> wrapper for the
>>>>>>>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>>>>>>>> Transforms, which
>>>>>>>>>> > > > > is often
>>>>>>>>>> > > > > > > > > > been requested. This will also enable use of
>>>>>>>>>> the Beam SQL
>>>>>>>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Features
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam
>>>>>>>>>> C=core. The Go SDK
>>>>>>>>>> > > > > implements
>>>>>>>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>>>>>>>> CombineFns and access
>>>>>>>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>>>>>>>> GroupByKey, and
>>>>>>>>>> > > > > features
>>>>>>>>>> > > > > > > > like
>>>>>>>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>>>>>>>> > > > > > > > > > > >> > >> Basic windowing will be fully
>>>>>>>>>> supported for batch even
>>>>>>>>>> > > > > > > > through
>>>>>>>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to
>>>>>>>>>> be versatile for
>>>>>>>>>> > > > > batch
>>>>>>>>>> > > > > > > > > > execution on portable runners, and for simple
>>>>>>>>>> streaming
>>>>>>>>>> > > > > pipelines.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Repo Testing
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's
>>>>>>>>>> unit tests. On
>>>>>>>>>> > > > > top of
>>>>>>>>>> > > > > > > > > > that, it runs all it's integration tests
>>>>>>>>>> against the Python
>>>>>>>>>> > > > > Portable
>>>>>>>>>> > > > > > > > runner,
>>>>>>>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>>>>>>>> breaking changes
>>>>>>>>>> > > > > without
>>>>>>>>>> > > > > > > > > > overspending community resources. Those same
>>>>>>>>>> tests are also
>>>>>>>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and
>>>>>>>>>> Spark.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>>>>>>>> runners via the
>>>>>>>>>> > > > > > > > appropriate
>>>>>>>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>>>>>>>> management server),
>>>>>>>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin
>>>>>>>>>> up runner
>>>>>>>>>> > > > > instances for
>>>>>>>>>> > > > > > > > > > you). Documentation for executing tests and
>>>>>>>>>> adding new ones
>>>>>>>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are
>>>>>>>>>> accessible to Go
>>>>>>>>>> > > > > developers as
>>>>>>>>>> > > > > > > > > > they're implemented with the standard Go
>>>>>>>>>> testing tools.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Shortcomings
>>>>>>>>>> > > > > > > > > > > >> > >> That said, there's still much to do.
>>>>>>>>>> Let me briefly
>>>>>>>>>> > > > > tell you
>>>>>>>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>>>>>>>> whether they block
>>>>>>>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>>>>>>>> implemented as
>>>>>>>>>> > > > > Splittable
>>>>>>>>>> > > > > > > > > > DoFn.
>>>>>>>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in,
>>>>>>>>>> it will serve as
>>>>>>>>>> > > > > a the
>>>>>>>>>> > > > > > > > > > first example for future contributions for
>>>>>>>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>>>>>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but
>>>>>>>>>> at this point
>>>>>>>>>> > > > > users are
>>>>>>>>>> > > > > > > > > > empowered to write their own DoFns or wrap
>>>>>>>>>> existing transforms
>>>>>>>>>> > > > > for
>>>>>>>>>> > > > > > > > Cross
>>>>>>>>>> > > > > > > > > > Language use.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming
>>>>>>>>>> focused features have
>>>>>>>>>> > > > > yet to
>>>>>>>>>> > > > > > > > be
>>>>>>>>>> > > > > > > > > > implemented, but they're largely additions to
>>>>>>>>>> what exists already
>>>>>>>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of
>>>>>>>>>> the work is
>>>>>>>>>> > > > > definining
>>>>>>>>>> > > > > > > > how a
>>>>>>>>>> > > > > > > > > > user specifies their desires, and turning those
>>>>>>>>>> into the
>>>>>>>>>> > > > > appropriate
>>>>>>>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time.
>>>>>>>>>> Back in October I
>>>>>>>>>> > > > > wrote at
>>>>>>>>>> > > > > > > > > > length on the wiki [1] what's missing for
>>>>>>>>>> additional streaming
>>>>>>>>>> > > > > > > > features.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>>>>>>>> recently, there's
>>>>>>>>>> > > > > likely
>>>>>>>>>> > > > > > > > > > still more we could test to improve our
>>>>>>>>>> confidence in the SDK,
>>>>>>>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>>>>>>>> transforms
>>>>>>>>>> > > > > libraries and
>>>>>>>>>> > > > > > > > > > examples.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Moving Forward
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>>>>>>>> incorporating the Go
>>>>>>>>>> > > > > SDK
>>>>>>>>>> > > > > > > > fully
>>>>>>>>>> > > > > > > > > > into the Beam Programming Guide. I've audited
>>>>>>>>>> the guide [3], and
>>>>>>>>>> > > > > > > > > > > >> > >> am beginning to add missing content
>>>>>>>>>> and filling in the
>>>>>>>>>> > > > > Go
>>>>>>>>>> > > > > > > > > > specific gaps. This will be tied to improving
>>>>>>>>>> the Go Doc with
>>>>>>>>>> > > > > more Go
>>>>>>>>>> > > > > > > > > > > >> > >> specific user documentation that
>>>>>>>>>> isn't appropriate for
>>>>>>>>>> > > > > the
>>>>>>>>>> > > > > > > > BPG.
>>>>>>>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue
>>>>>>>>>> around the public
>>>>>>>>>> > > > > display of
>>>>>>>>>> > > > > > > > > > that GoDoc.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a
>>>>>>>>>> binding vote, I will
>>>>>>>>>> > > > > > > > > > incorporate the SDK into the release process,
>>>>>>>>>> and remove the
>>>>>>>>>> > > > > > > > "experimental"
>>>>>>>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>>>>>>>> entails updating
>>>>>>>>>> > > > > the
>>>>>>>>>> > > > > > > > > > release scripts to also build and publish the
>>>>>>>>>> Go SDK Docker
>>>>>>>>>> > > > > containers.
>>>>>>>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>>>>>>>> technically already
>>>>>>>>>> > > > > doing so
>>>>>>>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> The clearest signal to the Go
>>>>>>>>>> community however will be
>>>>>>>>>> > > > > > > > > > migrating the SDK to use Go Modules for
>>>>>>>>>> dependency version
>>>>>>>>>> > > > > control,
>>>>>>>>>> > > > > > > > > > > >> > >> which Daniel is planning on working
>>>>>>>>>> on after his Kafka
>>>>>>>>>> > > > > task.
>>>>>>>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>>>>>>>> contributors, and
>>>>>>>>>> > > > > users
>>>>>>>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>>>>>>>> dependency
>>>>>>>>>> > > > > management.
>>>>>>>>>> > > > > > > > It
>>>>>>>>>> > > > > > > > > > will remove the "+incompatible" tags one sees
>>>>>>>>>> on the
>>>>>>>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> I'm very happy to answer any
>>>>>>>>>> questions you might have
>>>>>>>>>> > > > > about
>>>>>>>>>> > > > > > > > the
>>>>>>>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>>>>>>>> intentionally
>>>>>>>>>> > > > > avoided
>>>>>>>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they
>>>>>>>>>> can distract
>>>>>>>>>> > > > > from the
>>>>>>>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we
>>>>>>>>>> need to tell
>>>>>>>>>> > > > > them that
>>>>>>>>>> > > > > > > > they
>>>>>>>>>> > > > > > > > > > can
>>>>>>>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> Robert Burke
>>>>>>>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>>> > > > > > > > > > > >> > >> [0]
>>>>>>>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>>>>> > > > > > > > > > > >> > >> [1]
>>>>>>>>>> > > > > > > > > >
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>>>>>>>> > > > > > > > > > > >> > >> [2]
>>>>>>>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>>>>>>>> > > > > > > > > > > >> > >> [3]
>>>>>>>>>> > > > > > > > > >
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>>>>> > > > > > > > > > (SDK Audit sheet)
>>>>>>>>>> > > > > > > > > > > >> > >> [4]
>>>>>>>>>> > > > > > > > > >
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>>>>>>>> > > > > > > > > > > >> >
>>>>>>>>>> > > > > > > > > >
>>>>>>>>>> > > > > > > > >
>>>>>>>>>> > > > > > > >
>>>>>>>>>> > > > > > >
>>>>>>>>>> > > > > >
>>>>>>>>>> > > > >
>>>>>>>>>> > > >
>>>>>>>>>> > >
>>>>>>>>>>
>>>>>>>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Sachin Agarwal <sa...@google.com>.
We have also started work with the Google Golang devrel team to work on
messaging once the batteries are “included”.  They’re super excited!

On Wed, Nov 24, 2021 at 10:24 AM Ahmet Altay <al...@google.com> wrote:

> Great. I retweeted that on the official account.
>
> On Wed, Nov 24, 2021 at 10:14 AM Robert Burke <ro...@frantil.com> wrote:
>
>> I think so.
>>
>> My tweet [1] on the topic got a bit of traction even without the official
>> beam account boosting it.
>>
>> [1]
>> https://twitter.com/lostluck/status/1456720240092467200?t=owVJd6ZuTVMUkNyvYNr4Xg&s=19
>>
>> On Wed, Nov 24, 2021, 10:11 AM Ahmet Altay <al...@google.com> wrote:
>>
>>> Thank you Rebo, and congratulations to everyone working on Go SDK :)
>>>
>>> @Robert Burke <re...@google.com> @Brittany Hermann <he...@google.com> @Sachin
>>> Agarwal <sa...@google.com> - Should we share this on Beam's twitter
>>> and other social media pages?
>>>
>>> On Fri, Nov 5, 2021 at 1:29 PM Robert Burke <ro...@frantil.com> wrote:
>>>
>>>> It's my great pleasure to announce that the Apache Beam Go SDK is no
>>>> longer experimental. https://beam.apache.org/blog/go-sdk-release/
>>>>
>>>> Thank you everyone.
>>>> Robert Burke
>>>> Beam Go Busybody
>>>>
>>>> On Thu, Nov 4, 2021, 6:29 PM Robert Burke <ro...@frantil.com> wrote:
>>>>
>>>>> At this point I just need an LGTM on the blog post PR, as the draft is
>>>>> finalized.
>>>>>
>>>>> Udi added the sdks/v2.33.0 tag which works as expected. I've also
>>>>> verified that the appropriate container is used by default when not
>>>>> specified which is the last unknown in this process.
>>>>>
>>>>> Who's ready to release a new SDK? I am!
>>>>>
>>>>>  https://github.com/apache/beam/pull/15894 (or join the exciting
>>>>> reaction emoji on the top post).
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:
>>>>>
>>>>>> The current draft of the exit blog post is
>>>>>> https://github.com/apache/beam/pull/15894
>>>>>> Comments are very welcome. I'm going to continue looking for Known
>>>>>> issues (which will be linked to their respective JIRAs) tomorrow.
>>>>>>
>>>>>> Since RC1 is getting cycled, I can also go back to the original plan
>>>>>> of v2.33.0, if we'd like to get it out this week.
>>>>>>
>>>>>>
>>>>>> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:
>>>>>>
>>>>>>> Investigation yielded that there's no way around the prefixed tags.
>>>>>>> The JIRA has been commented with the explanation.
>>>>>>>
>>>>>>> https://github.com/apache/beam/pull/15881 has the release script
>>>>>>> updates.
>>>>>>>
>>>>>>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The
>>>>>>> draft PR will be linked here.
>>>>>>>
>>>>>>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
>>>>>>> inclined to wait for that release to finish before publishing the blogpost.
>>>>>>> I'll link the draft PR here as soon as it's ready.
>>>>>>>
>>>>>>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also
>>>>>>> prefix tagged so there isn't a gap in versions between the unmoduled code
>>>>>>> and moduled code.
>>>>>>>
>>>>>>> Once published,  that'll be the end of this thread.
>>>>>>>
>>>>>>> Thank you very much everyone.
>>>>>>>
>>>>>>> Robert Burke
>>>>>>> Beam Go Busybody
>>>>>>>
>>>>>>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> +1 to extra tags. They'll be trivial to add to our release process,
>>>>>>>> and git tags are lightweight by design so I don't foresee any problems.
>>>>>>>>
>>>>>>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <
>>>>>>>> robertwb@google.com> wrote:
>>>>>>>>
>>>>>>>>> Glad you were able to figure it out. The extra tags are certainly
>>>>>>>>> worth making this work if it's what we have to do, and shouldn't be
>>>>>>>>> too much of a problem (until, hopefully, it's fixed on the go
>>>>>>>>> side).
>>>>>>>>>
>>>>>>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> >
>>>>>>>>> > With Kyle's help with the additional tagging of the next RC, we
>>>>>>>>> have validated that this is the currently correct approach.
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>>>>>>> >
>>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>>>>>>> >
>>>>>>>>> > Or even:
>>>>>>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>>>>>>> (links to latest tagged version)
>>>>>>>>> >
>>>>>>>>> > The main cost to this approach is doubling the number of tags in
>>>>>>>>> the tags list: https://github.com/apache/beam/tags which is not
>>>>>>>>> ideal, but overall a small cost. There's no need for "full publish" of
>>>>>>>>> these additional tags, so we won't be doubling our "releases" (see
>>>>>>>>> https://github.com/apache/beam/releases).
>>>>>>>>> >
>>>>>>>>> > I'll still be filing a bug against the Go commands since the
>>>>>>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>>>>>>> so, we can always delete the tags from the affected branches, and cease the
>>>>>>>>> behavior going forward. I'll search through the existing Go issues first
>>>>>>>>> however to see if this has been previously discussed, and report my
>>>>>>>>> findings here either way.
>>>>>>>>> >
>>>>>>>>> > This does require 2 small changes to release guide: The rc
>>>>>>>>> tagging script, and the finally tagging:
>>>>>>>>> >
>>>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>>>>>>> >
>>>>>>>>> > I'll make this change later this week (or early next) assuming
>>>>>>>>> there are no objections.
>>>>>>>>> >
>>>>>>>>> > Thank you all very much for your patience,
>>>>>>>>> > Robert Burke
>>>>>>>>> > Beam Go Busybody
>>>>>>>>> >
>>>>>>>>> >
>>>>>>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> > > With much research in reading the Go Modules documentation, I
>>>>>>>>> have confirmed what the issue is.
>>>>>>>>> > >
>>>>>>>>> > > We added the go.mod file to sdks/ under the repo root because
>>>>>>>>> it's a cleaner spot for the change, captures the Java and Python container
>>>>>>>>> boot code (written in Go) into the module and avoids conflicts in
>>>>>>>>> interpretations of the vendor directory that lives at the root level.
>>>>>>>>> > >
>>>>>>>>> > > However, we missed that when doing so, the standard version
>>>>>>>>> tags would only apply to modules at the root level, not at modules in
>>>>>>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>>>>>>> quoting the important paragraph:
>>>>>>>>> > >
>>>>>>>>> > > > If a module is defined in a subdirectory within the
>>>>>>>>> repository, that is, the module subdirectory portion of
>>>>>>>>> > > > the module path is not empty, then each tag name must be
>>>>>>>>> prefixed with the module subdirectory,
>>>>>>>>> > > > followed by a slash. For example, the module
>>>>>>>>> golang.org/x/tools/gopls is defined in the gopls
>>>>>>>>> > > > subdirectory of the repository with root path
>>>>>>>>> golang.org/x/tools. The version v0.4.0 of that module must > have
>>>>>>>>> the tag named gopls/v0.4.0 in that repository.
>>>>>>>>> > >
>>>>>>>>> > > Specifically, for the Go SDK to be able to be fetched at the
>>>>>>>>> right version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>>>>>>> "sdks/v2.34.0-RC1"
>>>>>>>>> > >
>>>>>>>>> > > So, the fix for the Go versioning issue is to amend our
>>>>>>>>> Release process (including generating Release Candidate builds) to also add
>>>>>>>>> a prefixed version tag with the same version.
>>>>>>>>> > >
>>>>>>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if
>>>>>>>>> there are no objections we can back update the 2.33.0 release branch with
>>>>>>>>> such a prefixed tag. At which point I can also write the Official
>>>>>>>>> Experiemental Exit Blog post.
>>>>>>>>> > >
>>>>>>>>> > > Thank you all for your patience.
>>>>>>>>> > > Robert Burke
>>>>>>>>> > >
>>>>>>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>>>>>>> > > > Thank you for the detailed update! Let us know if we can
>>>>>>>>> help.
>>>>>>>>> > > >
>>>>>>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <
>>>>>>>>> lostluck@apache.org> wrote:
>>>>>>>>> > > >
>>>>>>>>> > > > > This is a status update.
>>>>>>>>> > > > >
>>>>>>>>> > > > > At this point 2.33.0 is released, but there are
>>>>>>>>> difficulties with
>>>>>>>>> > > > > accessing the tagged versions using the standard go tools.
>>>>>>>>> It's currently
>>>>>>>>> > > > > under investigation.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Using the v2 path in a go program then running `go mod
>>>>>>>>> tidy` will populate
>>>>>>>>> > > > > the file with  a pseudo-version rather than the latest tag
>>>>>>>>> (v2.33.0)  (eg
>>>>>>>>> > > > > the line looks like
>>>>>>>>> > > > > require github.com/apache/beam/sdks/v2
>>>>>>>>> v2.0.0-20211013181004-a9120e083008
>>>>>>>>> > > > > )
>>>>>>>>> > > > >
>>>>>>>>> > > > > While this will work, it's not the desired experience for
>>>>>>>>> users at this
>>>>>>>>> > > > > point. Current downside is that the releases are not
>>>>>>>>> meaningful targets for
>>>>>>>>> > > > > some reason. However, we retain the other benefits of Go
>>>>>>>>> Modules (actual
>>>>>>>>> > > > > dependency versioning, management by go tools).
>>>>>>>>> > > > >
>>>>>>>>> > > > > The issue is some combination of the go tooling [A] , that
>>>>>>>>> we added a go
>>>>>>>>> > > > > mod file outside of the repo root [B], and that we did not
>>>>>>>>> increment the
>>>>>>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>>>>>>> > > > >
>>>>>>>>> > > > > [B] From the go documentation, this should be legal and
>>>>>>>>> fine, even if it's
>>>>>>>>> > > > > not recommended. This is fortunate because the root of the
>>>>>>>>> repo would have
>>>>>>>>> > > > > played poorly with root vendor directory, which the go
>>>>>>>>> tools have opinions
>>>>>>>>> > > > > on.
>>>>>>>>> > > > >
>>>>>>>>> > > > > [C] Incrementing the major version is recommended,in the
>>>>>>>>> Go Modules
>>>>>>>>> > > > > documentation, when transitioning to Go Modules. However,
>>>>>>>>> it never said it
>>>>>>>>> > > > > was required, nor did it indicate this current failure
>>>>>>>>> mode. If anything
>>>>>>>>> > > > > this should be documented in those docs, if it's not
>>>>>>>>> another bug. We would
>>>>>>>>> > > > > not necessarily want to declare a global v3 for beam at
>>>>>>>>> this time, for just
>>>>>>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>>>>>>> Notionally there are
>>>>>>>>> > > > > some larger breaking changes the Java and Python SDKs
>>>>>>>>> would want to make in
>>>>>>>>> > > > > such an event, and thus it's a larger conversation, that
>>>>>>>>> is out of scope at
>>>>>>>>> > > > > this time.
>>>>>>>>> > > > >
>>>>>>>>> > > > > This leaves [A] where some mis-understanding of the
>>>>>>>>> documented semantics
>>>>>>>>> > > > > occurred. I certainly expected the tagged version of the
>>>>>>>>> non-root go-module
>>>>>>>>> > > > > to be inherited from the parent, not wholesale ignored. As
>>>>>>>>> a result, I'll
>>>>>>>>> > > > > be filing a bug against the go tools to determine this,
>>>>>>>>> and see what paths
>>>>>>>>> > > > > forward exist.
>>>>>>>>> > > > >
>>>>>>>>> > > > > It's my hope to resolve this before we write a properly
>>>>>>>>> Experimental Exit
>>>>>>>>> > > > > blog post for the Go SDK.
>>>>>>>>> > > > >
>>>>>>>>> > > > > Thank you for your patience, and time.
>>>>>>>>> > > > > Robert Burke
>>>>>>>>> > > > > Beam Go Busybody
>>>>>>>>> > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>>>>>>> wrote:
>>>>>>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the
>>>>>>>>> SDK now uses Go
>>>>>>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>>>>>>> contributions. [2]
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > The Module file lives in the sdks/ directory so there's
>>>>>>>>> a single Go
>>>>>>>>> > > > > Module for the whole SDK, tests, examples, and any support
>>>>>>>>> code for the
>>>>>>>>> > > > > container boot builds. This excludes the Go SDK Code katas
>>>>>>>>> [3] go modules
>>>>>>>>> > > > > which can be updated once 2.33.0 has been released.
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>>>>>>>> builds, and
>>>>>>>>> > > > > default uses the release specific container for docker
>>>>>>>>> execution jobs. For
>>>>>>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>>>>>>> validation will
>>>>>>>>> > > > > need to explictly specify RC versions of containers.
>>>>>>>>> However, given that
>>>>>>>>> > > > > the Go SDK container and worker boot process rarely
>>>>>>>>> changes, this is
>>>>>>>>> > > > > unlikely to be an issue.
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > At present I'm cleaning up some of the references to
>>>>>>>>> experimental, and
>>>>>>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>>>>>>> release (even
>>>>>>>>> > > > > though that's 4-6 weeks out from actual release.)
>>>>>>>>> CHANGES.md  will be
>>>>>>>>> > > > > updated to note the event, but a larger blogpost will
>>>>>>>>> happen after the
>>>>>>>>> > > > > release goes public.
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > Cheers,
>>>>>>>>> > > > > > Robert Burke
>>>>>>>>> > > > > > Defacto Beam Go TL.
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > [1]
>>>>>>>>> > > > >
>>>>>>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>>>>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>>>>>>> > > > > > [3]
>>>>>>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>>>>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>>>>>>> > > > > >
>>>>>>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com>
>>>>>>>>> wrote:
>>>>>>>>> > > > > > > +1, congratulations & thank you!
>>>>>>>>> > > > > > >
>>>>>>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>>>>>>> lostluck@apache.org>
>>>>>>>>> > > > > wrote:
>>>>>>>>> > > > > > >
>>>>>>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>>>>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which
>>>>>>>>> goes up to section
>>>>>>>>> > > > > ~4.3.
>>>>>>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>>>>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <
>>>>>>>>> robert@frantil.com> wrote:
>>>>>>>>> > > > > > > > > Yup!
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > > My immediate plan is to work on incorporating the
>>>>>>>>> Go SDK fully
>>>>>>>>> > > > > into the
>>>>>>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>>>>>>>> > > > > > > > > am beginning to add missing content and filling in
>>>>>>>>> the Go specific
>>>>>>>>> > > > > gaps.
>>>>>>>>> > > > > > > > > This will be tied to improving the Go Doc with
>>>>>>>>> more Go
>>>>>>>>> > > > > > > > > specific user documentation that isn't appropriate
>>>>>>>>> for the BPG.
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > > My audit of the guide is here:
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > >
>>>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>>>>>>> feature page
>>>>>>>>> > > > > looks
>>>>>>>>> > > > > > > > worse
>>>>>>>>> > > > > > > > > than it is, as it was more productive to focus on
>>>>>>>>> what isn't
>>>>>>>>> > > > > available
>>>>>>>>> > > > > > > > than
>>>>>>>>> > > > > > > > > what is. That's a snapshot of my actual working
>>>>>>>>> sheet but I'll be
>>>>>>>>> > > > > > > > updating
>>>>>>>>> > > > > > > > > it as needed.
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>>>>>>> iemejia@gmail.com>
>>>>>>>>> > > > > wrote:
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > > > Oups forgot to write one question. Will this
>>>>>>>>> come with revamped
>>>>>>>>> > > > > > > > > > website instructions/doc for golang too?
>>>>>>>>> > > > > > > > > >
>>>>>>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>>>>>>> iemejia@gmail.com>
>>>>>>>>> > > > > > > > wrote:
>>>>>>>>> > > > > > > > > > >
>>>>>>>>> > > > > > > > > > > Huge +1
>>>>>>>>> > > > > > > > > > >
>>>>>>>>> > > > > > > > > > > This is definitely something many people have
>>>>>>>>> asked about, so
>>>>>>>>> > > > > it is
>>>>>>>>> > > > > > > > > > > great to see it finally happening.
>>>>>>>>> > > > > > > > > > >
>>>>>>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth
>>>>>>>>> Knowles <
>>>>>>>>> > > > > kenn@apache.org>
>>>>>>>>> > > > > > > > wrote:
>>>>>>>>> > > > > > > > > > > >
>>>>>>>>> > > > > > > > > > > > +1 awesome
>>>>>>>>> > > > > > > > > > > >
>>>>>>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert
>>>>>>>>> Burke <
>>>>>>>>> > > > > lostluck@apache.org
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > > > > wrote:
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim
>>>>>>>>> to get those (Go
>>>>>>>>> > > > > > > > modules
>>>>>>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>>>>>>>> certainly
>>>>>>>>> > > > > before the
>>>>>>>>> > > > > > > > 2.33
>>>>>>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>>>>>>> process.
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>>>>>>>> future, we may
>>>>>>>>> > > > > want a
>>>>>>>>> > > > > > > > > > harder break between a newer Generic first API
>>>>>>>>> and and the
>>>>>>>>> > > > > current
>>>>>>>>> > > > > > > > version,
>>>>>>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in
>>>>>>>>> Go aren't
>>>>>>>>> > > > > identical to
>>>>>>>>> > > > > > > > the
>>>>>>>>> > > > > > > > > > feature referred to by that term in Java, C++,
>>>>>>>>> Rust, etc, so
>>>>>>>>> > > > > it'll
>>>>>>>>> > > > > > > > take a
>>>>>>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> However, by the current nature of Go, we
>>>>>>>>> had to have pretty
>>>>>>>>> > > > > > > > > > sophisticated reflective analysis to handle
>>>>>>>>> DoFns and map them
>>>>>>>>> > > > > to their
>>>>>>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>>>>>>> emitter, and
>>>>>>>>> > > > > Iterator
>>>>>>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go
>>>>>>>>> SDK internals to
>>>>>>>>> > > > > use
>>>>>>>>> > > > > > > > > > generics (like the implementation of Stats DoFns
>>>>>>>>> like Min, Max,
>>>>>>>>> > > > > etc)
>>>>>>>>> > > > > > > > would
>>>>>>>>> > > > > > > > > > also be able to be made transparently to most
>>>>>>>>> users, and
>>>>>>>>> > > > > certainly any
>>>>>>>>> > > > > > > > of
>>>>>>>>> > > > > > > > > > the framework for execution time handling (the
>>>>>>>>> "worker's SDK
>>>>>>>>> > > > > harness")
>>>>>>>>> > > > > > > > > > would be able to be cleaned up if need be.
>>>>>>>>> Finally, adding more
>>>>>>>>> > > > > > > > > > sophisticated DoFn registration and code
>>>>>>>>> generation would be
>>>>>>>>> > > > > able to
>>>>>>>>> > > > > > > > > > replace the optional code generator entirely,
>>>>>>>>> saving some users
>>>>>>>>> > > > > a `go
>>>>>>>>> > > > > > > > > > generate` step, simplifying getting improved
>>>>>>>>> execution
>>>>>>>>> > > > > performance.
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> Changing things like making a Type
>>>>>>>>> Parameterized
>>>>>>>>> > > > > PCollection,
>>>>>>>>> > > > > > > > would
>>>>>>>>> > > > > > > > > > be far more involved, as would trying to use
>>>>>>>>> some kind of Apply
>>>>>>>>> > > > > > > > format. The
>>>>>>>>> > > > > > > > > > lack of Method Overrides prevents the apply
>>>>>>>>> chaining approach.
>>>>>>>>> > > > > Or at
>>>>>>>>> > > > > > > > least
>>>>>>>>> > > > > > > > > > prevents it from working simply.
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> Finally, Go Generics won't be available
>>>>>>>>> until Go 1.18,
>>>>>>>>> > > > > which isn't
>>>>>>>>> > > > > > > > > > until next year. See
>>>>>>>>> https://blog.golang.org/generics-proposal
>>>>>>>>> > > > > for
>>>>>>>>> > > > > > > > > > details.
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17
>>>>>>>>> does include a
>>>>>>>>> > > > > Register
>>>>>>>>> > > > > > > > > > calling convention, leading to a modest
>>>>>>>>> performance improvement
>>>>>>>>> > > > > across
>>>>>>>>> > > > > > > > the
>>>>>>>>> > > > > > > > > > board.
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> Cheers,
>>>>>>>>> > > > > > > > > > > >> Robert Burke
>>>>>>>>> > > > > > > > > > > >>
>>>>>>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>>>>>>> > > > > robertwb@google.com>
>>>>>>>>> > > > > > > > wrote:
>>>>>>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>>>>>>> experimental once
>>>>>>>>> > > > > the Go
>>>>>>>>> > > > > > > > > > Modules
>>>>>>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK
>>>>>>>>> needs to support
>>>>>>>>> > > > > every
>>>>>>>>> > > > > > > > > > feature
>>>>>>>>> > > > > > > > > > > >> > to be accepted, especially now that we
>>>>>>>>> can do
>>>>>>>>> > > > > cross-language
>>>>>>>>> > > > > > > > > > > >> > transforms, and Go definitely supports
>>>>>>>>> enough to be quite
>>>>>>>>> > > > > > > > useful.
>>>>>>>>> > > > > > > > > > (WRT
>>>>>>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>>>>>>> supports the
>>>>>>>>> > > > > streaming
>>>>>>>>> > > > > > > > model
>>>>>>>>> > > > > > > > > > > >> > with windows and timestamps, and runs
>>>>>>>>> fine on a streaming
>>>>>>>>> > > > > > > > runner,
>>>>>>>>> > > > > > > > > > even
>>>>>>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>>>>>>> timers aren't yet
>>>>>>>>> > > > > > > > > > available.)
>>>>>>>>> > > > > > > > > > > >> >
>>>>>>>>> > > > > > > > > > > >> > This is a great milestone.
>>>>>>>>> > > > > > > > > > > >> >
>>>>>>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>>>>>>> Hamilton <
>>>>>>>>> > > > > > > > tysonjh@google.com>
>>>>>>>>> > > > > > > > > > wrote:
>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>>>>>>> status after Go
>>>>>>>>> > > > > Modules
>>>>>>>>> > > > > > > > > > are completed and the LICENSE issue is resolved.
>>>>>>>>> I don't think
>>>>>>>>> > > > > that
>>>>>>>>> > > > > > > > lacking
>>>>>>>>> > > > > > > > > > streaming support is a blocker. The other thing
>>>>>>>>> I checked to see
>>>>>>>>> > > > > was if
>>>>>>>>> > > > > > > > > > there were metrics available on
>>>>>>>>> metrics.beam.apache.org,
>>>>>>>>> > > > > specifically
>>>>>>>>> > > > > > > > for
>>>>>>>>> > > > > > > > > > measuring code health via post-commit over time,
>>>>>>>>> which there are
>>>>>>>>> > > > > and
>>>>>>>>> > > > > > > > the
>>>>>>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one
>>>>>>>>> thing that
>>>>>>>>> > > > > surprised me
>>>>>>>>> > > > > > > > from
>>>>>>>>> > > > > > > > > > your summary is that when Go introduces generics
>>>>>>>>> it won't result
>>>>>>>>> > > > > in any
>>>>>>>>> > > > > > > > > > backwards incompatible changes in Apache Beam.
>>>>>>>>> That's great
>>>>>>>>> > > > > news, but
>>>>>>>>> > > > > > > > does
>>>>>>>>> > > > > > > > > > it mean there will be a need to support both
>>>>>>>>> non-generic and
>>>>>>>>> > > > > generic
>>>>>>>>> > > > > > > > APIs
>>>>>>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>>>>>>> introduced in the
>>>>>>>>> > > > > Go
>>>>>>>>> > > > > > > > 1.17
>>>>>>>>> > > > > > > > > > release (optimistically) in August this year.
>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert
>>>>>>>>> Burke <
>>>>>>>>> > > > > > > > lostluck@apache.org>
>>>>>>>>> > > > > > > > > > wrote:
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache
>>>>>>>>> Beam Go SDK
>>>>>>>>> > > > > > > > experimental.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>>>>>>> community, and any
>>>>>>>>> > > > > > > > conditions
>>>>>>>>> > > > > > > > > > that remain that would prevent the exit.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> tl;dr;
>>>>>>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I
>>>>>>>>> have both.
>>>>>>>>> > > > > > > > > > > >> > >> This entails including it officially
>>>>>>>>> in the Release
>>>>>>>>> > > > > process,
>>>>>>>>> > > > > > > > > > removing the various "experimental" text
>>>>>>>>> throughout the repo etc,
>>>>>>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python
>>>>>>>>> and Java. Some Go
>>>>>>>>> > > > > > > > specific
>>>>>>>>> > > > > > > > > > tasks around dep versioning.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>>>>>>> efficiently for
>>>>>>>>> > > > > most
>>>>>>>>> > > > > > > > batch
>>>>>>>>> > > > > > > > > > tasks, including basic windowing.
>>>>>>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and
>>>>>>>>> are tested on all
>>>>>>>>> > > > > > > > Portable
>>>>>>>>> > > > > > > > > > runners.
>>>>>>>>> > > > > > > > > > > >> > >> The core APIs are not going to change
>>>>>>>>> in incompatible
>>>>>>>>> > > > > ways
>>>>>>>>> > > > > > > > going
>>>>>>>>> > > > > > > > > > forward.
>>>>>>>>> > > > > > > > > > > >> > >> Scalable transforms can be written
>>>>>>>>> through
>>>>>>>>> > > > > SplittableDoFns or
>>>>>>>>> > > > > > > > > > via Cross Language transforms.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete,
>>>>>>>>> but keeping it
>>>>>>>>> > > > > > > > experimental
>>>>>>>>> > > > > > > > > > doesn't help with that any further.
>>>>>>>>> > > > > > > > > > > >> > >> Communities grow through contributions
>>>>>>>>> and use, and
>>>>>>>>> > > > > > > > experimental
>>>>>>>>> > > > > > > > > > markers dissuade users.
>>>>>>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand
>>>>>>>>> what can be done
>>>>>>>>> > > > > with
>>>>>>>>> > > > > > > > the
>>>>>>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>>>>>>> Experimental, it's
>>>>>>>>> > > > > > > > because
>>>>>>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>>>>>>> significantly.
>>>>>>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work
>>>>>>>>> for users of
>>>>>>>>> > > > > the SDK
>>>>>>>>> > > > > > > > on
>>>>>>>>> > > > > > > > > > every release which leads to sticking to older
>>>>>>>>> versions or
>>>>>>>>> > > > > forking
>>>>>>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>>>>>>> should be looked
>>>>>>>>> > > > > > > > forward
>>>>>>>>> > > > > > > > > > to, and viewed as having little risk. Further
>>>>>>>>> while there's been
>>>>>>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low
>>>>>>>>> bar" is for a
>>>>>>>>> > > > > new
>>>>>>>>> > > > > > > > SDK, it
>>>>>>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I
>>>>>>>>> feel this has
>>>>>>>>> > > > > > > > > > > >> > >> hurt development and contribution of
>>>>>>>>> new SDK languages
>>>>>>>>> > > > > > > > (inherent
>>>>>>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>>>>>>> entirely clear
>>>>>>>>> > > > > what the
>>>>>>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>>>>>>> language like Go.
>>>>>>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>>>>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>>>> [0]) goes into
>>>>>>>>> > > > > detail
>>>>>>>>> > > > > > > > what it
>>>>>>>>> > > > > > > > > > means for a language without
>>>>>>>>> > > > > > > > > > > >> > >> Generics, or overloading, or
>>>>>>>>> inheritance to implement
>>>>>>>>> > > > > the
>>>>>>>>> > > > > > > > beam
>>>>>>>>> > > > > > > > > > model. One could largely throw away static types
>>>>>>>>> (like Python),
>>>>>>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go.
>>>>>>>>> It would not do
>>>>>>>>> > > > > if the
>>>>>>>>> > > > > > > > > > approach couldn't grow and scale to the Beam
>>>>>>>>> Model. It's also
>>>>>>>>> > > > > hard
>>>>>>>>> > > > > > > > > > > >> > >> to tell if an API is any good before
>>>>>>>>> there are users.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Further, in the early days of
>>>>>>>>> Portability, there
>>>>>>>>> > > > > wasn't a
>>>>>>>>> > > > > > > > way to
>>>>>>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise.
>>>>>>>>> It's an
>>>>>>>>> > > > > incredible
>>>>>>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial
>>>>>>>>> fanout of work on
>>>>>>>>> > > > > a
>>>>>>>>> > > > > > > > single
>>>>>>>>> > > > > > > > > > machine, write everything to a Reshuffle, just
>>>>>>>>> in order to scale
>>>>>>>>> > > > > up.
>>>>>>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is
>>>>>>>>> little more than
>>>>>>>>> > > > > > > > overhead.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> At this point, both of these needs are
>>>>>>>>> met within the
>>>>>>>>> > > > > Go SDK
>>>>>>>>> > > > > > > > for
>>>>>>>>> > > > > > > > > > open source.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Background
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam
>>>>>>>>> repo for a few
>>>>>>>>> > > > > years
>>>>>>>>> > > > > > > > now,
>>>>>>>>> > > > > > > > > > since it was accidentally merged into master.
>>>>>>>>> > > > > > > > > > > >> > >> Since then it's been called
>>>>>>>>> experimental, and not
>>>>>>>>> > > > > officially
>>>>>>>>> > > > > > > > > > part of the releases.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>>>>>>> around Beam
>>>>>>>>> > > > > Portability
>>>>>>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>>>>>>> specific )
>>>>>>>>> > > > > workers.
>>>>>>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline
>>>>>>>>> protos and FnAPI to
>>>>>>>>> > > > > > > > execute
>>>>>>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>>>>>>> Dataflow, but now
>>>>>>>>> > > > > > > > > > > >> > >> on all portable supported runners,
>>>>>>>>> like Flink, Spark,
>>>>>>>>> > > > > the
>>>>>>>>> > > > > > > > Python
>>>>>>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> API Stability
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed
>>>>>>>>> it's user API
>>>>>>>>> > > > > for DoFn
>>>>>>>>> > > > > > > > > > and pipeline construction since it was first
>>>>>>>>> merged in, and
>>>>>>>>> > > > > there are
>>>>>>>>> > > > > > > > no
>>>>>>>>> > > > > > > > > > > >> > >> changes to that on the horizon that
>>>>>>>>> can't be made in a
>>>>>>>>> > > > > > > > backwards
>>>>>>>>> > > > > > > > > > compatible manner. Largely these are related to
>>>>>>>>> New Features, or
>>>>>>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>>>>>>> advent of Go
>>>>>>>>> > > > > Generics
>>>>>>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator
>>>>>>>>> types).
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK
>>>>>>>>> has largely been
>>>>>>>>> > > > > under
>>>>>>>>> > > > > > > > work
>>>>>>>>> > > > > > > > > > for use within Google. It's use is called
>>>>>>>>> FlumeGo, representing
>>>>>>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top
>>>>>>>>> of Flume,
>>>>>>>>> > > > > Google's
>>>>>>>>> > > > > > > > batch
>>>>>>>>> > > > > > > > > > pipeline processing engine. Thus most of the
>>>>>>>>> focus on improving
>>>>>>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample
>>>>>>>>> use today, and
>>>>>>>>> > > > > there
>>>>>>>>> > > > > > > > hasn't
>>>>>>>>> > > > > > > > > > been a call for fundamental changes to the API
>>>>>>>>> for ergonomic or
>>>>>>>>> > > > > > > > > > > >> > >> usability concerns.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Scalability
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Google could get away without the Go
>>>>>>>>> SDK having an SDK
>>>>>>>>> > > > > side
>>>>>>>>> > > > > > > > > > scalability solution as a result of it's
>>>>>>>>> integration with Flume.
>>>>>>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK now supports
>>>>>>>>> SplittableDoFns along with
>>>>>>>>> > > > > Dynamic
>>>>>>>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>>>>>>>> transforms
>>>>>>>>> > > > > natively
>>>>>>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>>>>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>>>>>>> Transforms, with
>>>>>>>>> > > > > Beam
>>>>>>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>>>>>>> transforms
>>>>>>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper
>>>>>>>>> away.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who
>>>>>>>>> implemented the SDF
>>>>>>>>> > > > > side
>>>>>>>>> > > > > > > > work,
>>>>>>>>> > > > > > > > > > and completed the Xlang work,) is adding a
>>>>>>>>> wrapper for the
>>>>>>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>>>>>>> Transforms, which
>>>>>>>>> > > > > is often
>>>>>>>>> > > > > > > > > > been requested. This will also enable use of the
>>>>>>>>> Beam SQL
>>>>>>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Features
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core.
>>>>>>>>> The Go SDK
>>>>>>>>> > > > > implements
>>>>>>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>>>>>>> CombineFns and access
>>>>>>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>>>>>>> GroupByKey, and
>>>>>>>>> > > > > features
>>>>>>>>> > > > > > > > like
>>>>>>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>>>>>>> > > > > > > > > > > >> > >> Basic windowing will be fully
>>>>>>>>> supported for batch even
>>>>>>>>> > > > > > > > through
>>>>>>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>>>>>>>> versatile for
>>>>>>>>> > > > > batch
>>>>>>>>> > > > > > > > > > execution on portable runners, and for simple
>>>>>>>>> streaming
>>>>>>>>> > > > > pipelines.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Repo Testing
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's
>>>>>>>>> unit tests. On
>>>>>>>>> > > > > top of
>>>>>>>>> > > > > > > > > > that, it runs all it's integration tests against
>>>>>>>>> the Python
>>>>>>>>> > > > > Portable
>>>>>>>>> > > > > > > > runner,
>>>>>>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>>>>>>> breaking changes
>>>>>>>>> > > > > without
>>>>>>>>> > > > > > > > > > overspending community resources. Those same
>>>>>>>>> tests are also
>>>>>>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>>>>>>> runners via the
>>>>>>>>> > > > > > > > appropriate
>>>>>>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>>>>>>> management server),
>>>>>>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up
>>>>>>>>> runner
>>>>>>>>> > > > > instances for
>>>>>>>>> > > > > > > > > > you). Documentation for executing tests and
>>>>>>>>> adding new ones
>>>>>>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are
>>>>>>>>> accessible to Go
>>>>>>>>> > > > > developers as
>>>>>>>>> > > > > > > > > > they're implemented with the standard Go testing
>>>>>>>>> tools.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Shortcomings
>>>>>>>>> > > > > > > > > > > >> > >> That said, there's still much to do.
>>>>>>>>> Let me briefly
>>>>>>>>> > > > > tell you
>>>>>>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>>>>>>> whether they block
>>>>>>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>>>>>>> implemented as
>>>>>>>>> > > > > Splittable
>>>>>>>>> > > > > > > > > > DoFn.
>>>>>>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in,
>>>>>>>>> it will serve as
>>>>>>>>> > > > > a the
>>>>>>>>> > > > > > > > > > first example for future contributions for
>>>>>>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>>>>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at
>>>>>>>>> this point
>>>>>>>>> > > > > users are
>>>>>>>>> > > > > > > > > > empowered to write their own DoFns or wrap
>>>>>>>>> existing transforms
>>>>>>>>> > > > > for
>>>>>>>>> > > > > > > > Cross
>>>>>>>>> > > > > > > > > > Language use.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming
>>>>>>>>> focused features have
>>>>>>>>> > > > > yet to
>>>>>>>>> > > > > > > > be
>>>>>>>>> > > > > > > > > > implemented, but they're largely additions to
>>>>>>>>> what exists already
>>>>>>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of
>>>>>>>>> the work is
>>>>>>>>> > > > > definining
>>>>>>>>> > > > > > > > how a
>>>>>>>>> > > > > > > > > > user specifies their desires, and turning those
>>>>>>>>> into the
>>>>>>>>> > > > > appropriate
>>>>>>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back
>>>>>>>>> in October I
>>>>>>>>> > > > > wrote at
>>>>>>>>> > > > > > > > > > length on the wiki [1] what's missing for
>>>>>>>>> additional streaming
>>>>>>>>> > > > > > > > features.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>>>>>>> recently, there's
>>>>>>>>> > > > > likely
>>>>>>>>> > > > > > > > > > still more we could test to improve our
>>>>>>>>> confidence in the SDK,
>>>>>>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>>>>>>> transforms
>>>>>>>>> > > > > libraries and
>>>>>>>>> > > > > > > > > > examples.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Moving Forward
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>>>>>>> incorporating the Go
>>>>>>>>> > > > > SDK
>>>>>>>>> > > > > > > > fully
>>>>>>>>> > > > > > > > > > into the Beam Programming Guide. I've audited
>>>>>>>>> the guide [3], and
>>>>>>>>> > > > > > > > > > > >> > >> am beginning to add missing content
>>>>>>>>> and filling in the
>>>>>>>>> > > > > Go
>>>>>>>>> > > > > > > > > > specific gaps. This will be tied to improving
>>>>>>>>> the Go Doc with
>>>>>>>>> > > > > more Go
>>>>>>>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>>>>>>>> appropriate for
>>>>>>>>> > > > > the
>>>>>>>>> > > > > > > > BPG.
>>>>>>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around
>>>>>>>>> the public
>>>>>>>>> > > > > display of
>>>>>>>>> > > > > > > > > > that GoDoc.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a
>>>>>>>>> binding vote, I will
>>>>>>>>> > > > > > > > > > incorporate the SDK into the release process,
>>>>>>>>> and remove the
>>>>>>>>> > > > > > > > "experimental"
>>>>>>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>>>>>>> entails updating
>>>>>>>>> > > > > the
>>>>>>>>> > > > > > > > > > release scripts to also build and publish the Go
>>>>>>>>> SDK Docker
>>>>>>>>> > > > > containers.
>>>>>>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>>>>>>> technically already
>>>>>>>>> > > > > doing so
>>>>>>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> The clearest signal to the Go
>>>>>>>>> community however will be
>>>>>>>>> > > > > > > > > > migrating the SDK to use Go Modules for
>>>>>>>>> dependency version
>>>>>>>>> > > > > control,
>>>>>>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on
>>>>>>>>> after his Kafka
>>>>>>>>> > > > > task.
>>>>>>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>>>>>>> contributors, and
>>>>>>>>> > > > > users
>>>>>>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>>>>>>> dependency
>>>>>>>>> > > > > management.
>>>>>>>>> > > > > > > > It
>>>>>>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on
>>>>>>>>> the
>>>>>>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions
>>>>>>>>> you might have
>>>>>>>>> > > > > about
>>>>>>>>> > > > > > > > the
>>>>>>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>>>>>>> intentionally
>>>>>>>>> > > > > avoided
>>>>>>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they
>>>>>>>>> can distract
>>>>>>>>> > > > > from the
>>>>>>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we
>>>>>>>>> need to tell
>>>>>>>>> > > > > them that
>>>>>>>>> > > > > > > > they
>>>>>>>>> > > > > > > > > > can
>>>>>>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> Robert Burke
>>>>>>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>>> > > > > > > > > > > >> > >> [0]
>>>>>>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>>>> > > > > > > > > > > >> > >> [1]
>>>>>>>>> > > > > > > > > >
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > >
>>>>>>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>>>>>>> > > > > > > > > > > >> > >> [2]
>>>>>>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>>>>>>> > > > > > > > > > > >> > >> [3]
>>>>>>>>> > > > > > > > > >
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > >
>>>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>>>> > > > > > > > > > (SDK Audit sheet)
>>>>>>>>> > > > > > > > > > > >> > >> [4]
>>>>>>>>> > > > > > > > > >
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > >
>>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>>>>>>> > > > > > > > > > > >> >
>>>>>>>>> > > > > > > > > >
>>>>>>>>> > > > > > > > >
>>>>>>>>> > > > > > > >
>>>>>>>>> > > > > > >
>>>>>>>>> > > > > >
>>>>>>>>> > > > >
>>>>>>>>> > > >
>>>>>>>>> > >
>>>>>>>>>
>>>>>>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Ahmet Altay <al...@google.com>.
Great. I retweeted that on the official account.

On Wed, Nov 24, 2021 at 10:14 AM Robert Burke <ro...@frantil.com> wrote:

> I think so.
>
> My tweet [1] on the topic got a bit of traction even without the official
> beam account boosting it.
>
> [1]
> https://twitter.com/lostluck/status/1456720240092467200?t=owVJd6ZuTVMUkNyvYNr4Xg&s=19
>
> On Wed, Nov 24, 2021, 10:11 AM Ahmet Altay <al...@google.com> wrote:
>
>> Thank you Rebo, and congratulations to everyone working on Go SDK :)
>>
>> @Robert Burke <re...@google.com> @Brittany Hermann <he...@google.com> @Sachin
>> Agarwal <sa...@google.com> - Should we share this on Beam's twitter
>> and other social media pages?
>>
>> On Fri, Nov 5, 2021 at 1:29 PM Robert Burke <ro...@frantil.com> wrote:
>>
>>> It's my great pleasure to announce that the Apache Beam Go SDK is no
>>> longer experimental. https://beam.apache.org/blog/go-sdk-release/
>>>
>>> Thank you everyone.
>>> Robert Burke
>>> Beam Go Busybody
>>>
>>> On Thu, Nov 4, 2021, 6:29 PM Robert Burke <ro...@frantil.com> wrote:
>>>
>>>> At this point I just need an LGTM on the blog post PR, as the draft is
>>>> finalized.
>>>>
>>>> Udi added the sdks/v2.33.0 tag which works as expected. I've also
>>>> verified that the appropriate container is used by default when not
>>>> specified which is the last unknown in this process.
>>>>
>>>> Who's ready to release a new SDK? I am!
>>>>
>>>>  https://github.com/apache/beam/pull/15894 (or join the exciting
>>>> reaction emoji on the top post).
>>>>
>>>>
>>>>
>>>> On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:
>>>>
>>>>> The current draft of the exit blog post is
>>>>> https://github.com/apache/beam/pull/15894
>>>>> Comments are very welcome. I'm going to continue looking for Known
>>>>> issues (which will be linked to their respective JIRAs) tomorrow.
>>>>>
>>>>> Since RC1 is getting cycled, I can also go back to the original plan
>>>>> of v2.33.0, if we'd like to get it out this week.
>>>>>
>>>>>
>>>>> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:
>>>>>
>>>>>> Investigation yielded that there's no way around the prefixed tags.
>>>>>> The JIRA has been commented with the explanation.
>>>>>>
>>>>>> https://github.com/apache/beam/pull/15881 has the release script
>>>>>> updates.
>>>>>>
>>>>>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The
>>>>>> draft PR will be linked here.
>>>>>>
>>>>>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
>>>>>> inclined to wait for that release to finish before publishing the blogpost.
>>>>>> I'll link the draft PR here as soon as it's ready.
>>>>>>
>>>>>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also
>>>>>> prefix tagged so there isn't a gap in versions between the unmoduled code
>>>>>> and moduled code.
>>>>>>
>>>>>> Once published,  that'll be the end of this thread.
>>>>>>
>>>>>> Thank you very much everyone.
>>>>>>
>>>>>> Robert Burke
>>>>>> Beam Go Busybody
>>>>>>
>>>>>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> +1 to extra tags. They'll be trivial to add to our release process,
>>>>>>> and git tags are lightweight by design so I don't foresee any problems.
>>>>>>>
>>>>>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Glad you were able to figure it out. The extra tags are certainly
>>>>>>>> worth making this work if it's what we have to do, and shouldn't be
>>>>>>>> too much of a problem (until, hopefully, it's fixed on the go side).
>>>>>>>>
>>>>>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > With Kyle's help with the additional tagging of the next RC, we
>>>>>>>> have validated that this is the currently correct approach.
>>>>>>>> >
>>>>>>>> >
>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>>>>>> >
>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>>>>>> >
>>>>>>>> > Or even:
>>>>>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>>>>>> (links to latest tagged version)
>>>>>>>> >
>>>>>>>> > The main cost to this approach is doubling the number of tags in
>>>>>>>> the tags list: https://github.com/apache/beam/tags which is not
>>>>>>>> ideal, but overall a small cost. There's no need for "full publish" of
>>>>>>>> these additional tags, so we won't be doubling our "releases" (see
>>>>>>>> https://github.com/apache/beam/releases).
>>>>>>>> >
>>>>>>>> > I'll still be filing a bug against the Go commands since the
>>>>>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>>>>>> so, we can always delete the tags from the affected branches, and cease the
>>>>>>>> behavior going forward. I'll search through the existing Go issues first
>>>>>>>> however to see if this has been previously discussed, and report my
>>>>>>>> findings here either way.
>>>>>>>> >
>>>>>>>> > This does require 2 small changes to release guide: The rc
>>>>>>>> tagging script, and the finally tagging:
>>>>>>>> >
>>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>>>>>> >
>>>>>>>> >
>>>>>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>>>>>> >
>>>>>>>> > I'll make this change later this week (or early next) assuming
>>>>>>>> there are no objections.
>>>>>>>> >
>>>>>>>> > Thank you all very much for your patience,
>>>>>>>> > Robert Burke
>>>>>>>> > Beam Go Busybody
>>>>>>>> >
>>>>>>>> >
>>>>>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>>>>>>>> > > With much research in reading the Go Modules documentation, I
>>>>>>>> have confirmed what the issue is.
>>>>>>>> > >
>>>>>>>> > > We added the go.mod file to sdks/ under the repo root because
>>>>>>>> it's a cleaner spot for the change, captures the Java and Python container
>>>>>>>> boot code (written in Go) into the module and avoids conflicts in
>>>>>>>> interpretations of the vendor directory that lives at the root level.
>>>>>>>> > >
>>>>>>>> > > However, we missed that when doing so, the standard version
>>>>>>>> tags would only apply to modules at the root level, not at modules in
>>>>>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>>>>>> quoting the important paragraph:
>>>>>>>> > >
>>>>>>>> > > > If a module is defined in a subdirectory within the
>>>>>>>> repository, that is, the module subdirectory portion of
>>>>>>>> > > > the module path is not empty, then each tag name must be
>>>>>>>> prefixed with the module subdirectory,
>>>>>>>> > > > followed by a slash. For example, the module
>>>>>>>> golang.org/x/tools/gopls is defined in the gopls
>>>>>>>> > > > subdirectory of the repository with root path
>>>>>>>> golang.org/x/tools. The version v0.4.0 of that module must > have
>>>>>>>> the tag named gopls/v0.4.0 in that repository.
>>>>>>>> > >
>>>>>>>> > > Specifically, for the Go SDK to be able to be fetched at the
>>>>>>>> right version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>>>>>> "sdks/v2.34.0-RC1"
>>>>>>>> > >
>>>>>>>> > > So, the fix for the Go versioning issue is to amend our Release
>>>>>>>> process (including generating Release Candidate builds) to also add a
>>>>>>>> prefixed version tag with the same version.
>>>>>>>> > >
>>>>>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if
>>>>>>>> there are no objections we can back update the 2.33.0 release branch with
>>>>>>>> such a prefixed tag. At which point I can also write the Official
>>>>>>>> Experiemental Exit Blog post.
>>>>>>>> > >
>>>>>>>> > > Thank you all for your patience.
>>>>>>>> > > Robert Burke
>>>>>>>> > >
>>>>>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>>>>>> > > > Thank you for the detailed update! Let us know if we can help.
>>>>>>>> > > >
>>>>>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <
>>>>>>>> lostluck@apache.org> wrote:
>>>>>>>> > > >
>>>>>>>> > > > > This is a status update.
>>>>>>>> > > > >
>>>>>>>> > > > > At this point 2.33.0 is released, but there are
>>>>>>>> difficulties with
>>>>>>>> > > > > accessing the tagged versions using the standard go tools.
>>>>>>>> It's currently
>>>>>>>> > > > > under investigation.
>>>>>>>> > > > >
>>>>>>>> > > > > Using the v2 path in a go program then running `go mod
>>>>>>>> tidy` will populate
>>>>>>>> > > > > the file with  a pseudo-version rather than the latest tag
>>>>>>>> (v2.33.0)  (eg
>>>>>>>> > > > > the line looks like
>>>>>>>> > > > > require github.com/apache/beam/sdks/v2
>>>>>>>> v2.0.0-20211013181004-a9120e083008
>>>>>>>> > > > > )
>>>>>>>> > > > >
>>>>>>>> > > > > While this will work, it's not the desired experience for
>>>>>>>> users at this
>>>>>>>> > > > > point. Current downside is that the releases are not
>>>>>>>> meaningful targets for
>>>>>>>> > > > > some reason. However, we retain the other benefits of Go
>>>>>>>> Modules (actual
>>>>>>>> > > > > dependency versioning, management by go tools).
>>>>>>>> > > > >
>>>>>>>> > > > > The issue is some combination of the go tooling [A] , that
>>>>>>>> we added a go
>>>>>>>> > > > > mod file outside of the repo root [B], and that we did not
>>>>>>>> increment the
>>>>>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>>>>>> > > > >
>>>>>>>> > > > > [B] From the go documentation, this should be legal and
>>>>>>>> fine, even if it's
>>>>>>>> > > > > not recommended. This is fortunate because the root of the
>>>>>>>> repo would have
>>>>>>>> > > > > played poorly with root vendor directory, which the go
>>>>>>>> tools have opinions
>>>>>>>> > > > > on.
>>>>>>>> > > > >
>>>>>>>> > > > > [C] Incrementing the major version is recommended,in the Go
>>>>>>>> Modules
>>>>>>>> > > > > documentation, when transitioning to Go Modules. However,
>>>>>>>> it never said it
>>>>>>>> > > > > was required, nor did it indicate this current failure
>>>>>>>> mode. If anything
>>>>>>>> > > > > this should be documented in those docs, if it's not
>>>>>>>> another bug. We would
>>>>>>>> > > > > not necessarily want to declare a global v3 for beam at
>>>>>>>> this time, for just
>>>>>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>>>>>> Notionally there are
>>>>>>>> > > > > some larger breaking changes the Java and Python SDKs would
>>>>>>>> want to make in
>>>>>>>> > > > > such an event, and thus it's a larger conversation, that is
>>>>>>>> out of scope at
>>>>>>>> > > > > this time.
>>>>>>>> > > > >
>>>>>>>> > > > > This leaves [A] where some mis-understanding of the
>>>>>>>> documented semantics
>>>>>>>> > > > > occurred. I certainly expected the tagged version of the
>>>>>>>> non-root go-module
>>>>>>>> > > > > to be inherited from the parent, not wholesale ignored. As
>>>>>>>> a result, I'll
>>>>>>>> > > > > be filing a bug against the go tools to determine this, and
>>>>>>>> see what paths
>>>>>>>> > > > > forward exist.
>>>>>>>> > > > >
>>>>>>>> > > > > It's my hope to resolve this before we write a properly
>>>>>>>> Experimental Exit
>>>>>>>> > > > > blog post for the Go SDK.
>>>>>>>> > > > >
>>>>>>>> > > > > Thank you for your patience, and time.
>>>>>>>> > > > > Robert Burke
>>>>>>>> > > > > Beam Go Busybody
>>>>>>>> > > > >
>>>>>>>> > > > >
>>>>>>>> > > > >
>>>>>>>> > > > >
>>>>>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>>>>>> wrote:
>>>>>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the
>>>>>>>> SDK now uses Go
>>>>>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>>>>>> contributions. [2]
>>>>>>>> > > > > >
>>>>>>>> > > > > > The Module file lives in the sdks/ directory so there's a
>>>>>>>> single Go
>>>>>>>> > > > > Module for the whole SDK, tests, examples, and any support
>>>>>>>> code for the
>>>>>>>> > > > > container boot builds. This excludes the Go SDK Code katas
>>>>>>>> [3] go modules
>>>>>>>> > > > > which can be updated once 2.33.0 has been released.
>>>>>>>> > > > > >
>>>>>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>>>>>>> builds, and
>>>>>>>> > > > > default uses the release specific container for docker
>>>>>>>> execution jobs. For
>>>>>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>>>>>> validation will
>>>>>>>> > > > > need to explictly specify RC versions of containers.
>>>>>>>> However, given that
>>>>>>>> > > > > the Go SDK container and worker boot process rarely
>>>>>>>> changes, this is
>>>>>>>> > > > > unlikely to be an issue.
>>>>>>>> > > > > >
>>>>>>>> > > > > > At present I'm cleaning up some of the references to
>>>>>>>> experimental, and
>>>>>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>>>>>> release (even
>>>>>>>> > > > > though that's 4-6 weeks out from actual release.)
>>>>>>>> CHANGES.md  will be
>>>>>>>> > > > > updated to note the event, but a larger blogpost will
>>>>>>>> happen after the
>>>>>>>> > > > > release goes public.
>>>>>>>> > > > > >
>>>>>>>> > > > > > Cheers,
>>>>>>>> > > > > > Robert Burke
>>>>>>>> > > > > > Defacto Beam Go TL.
>>>>>>>> > > > > >
>>>>>>>> > > > > > [1]
>>>>>>>> > > > >
>>>>>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>>>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>>>>>> > > > > > [3]
>>>>>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>>>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>>>>>> > > > > >
>>>>>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com>
>>>>>>>> wrote:
>>>>>>>> > > > > > > +1, congratulations & thank you!
>>>>>>>> > > > > > >
>>>>>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>>>>>> lostluck@apache.org>
>>>>>>>> > > > > wrote:
>>>>>>>> > > > > > >
>>>>>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>>>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes
>>>>>>>> up to section
>>>>>>>> > > > > ~4.3.
>>>>>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>>>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>>>>>> > > > > > > >
>>>>>>>> > > > > > > >
>>>>>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <
>>>>>>>> robert@frantil.com> wrote:
>>>>>>>> > > > > > > > > Yup!
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > > > My immediate plan is to work on incorporating the
>>>>>>>> Go SDK fully
>>>>>>>> > > > > into the
>>>>>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>>>>>>> > > > > > > > > am beginning to add missing content and filling in
>>>>>>>> the Go specific
>>>>>>>> > > > > gaps.
>>>>>>>> > > > > > > > > This will be tied to improving the Go Doc with more
>>>>>>>> Go
>>>>>>>> > > > > > > > > specific user documentation that isn't appropriate
>>>>>>>> for the BPG.
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > > > My audit of the guide is here:
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > >
>>>>>>>> > > > >
>>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>>>>>> feature page
>>>>>>>> > > > > looks
>>>>>>>> > > > > > > > worse
>>>>>>>> > > > > > > > > than it is, as it was more productive to focus on
>>>>>>>> what isn't
>>>>>>>> > > > > available
>>>>>>>> > > > > > > > than
>>>>>>>> > > > > > > > > what is. That's a snapshot of my actual working
>>>>>>>> sheet but I'll be
>>>>>>>> > > > > > > > updating
>>>>>>>> > > > > > > > > it as needed.
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>>>>>> iemejia@gmail.com>
>>>>>>>> > > > > wrote:
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > > > > Oups forgot to write one question. Will this come
>>>>>>>> with revamped
>>>>>>>> > > > > > > > > > website instructions/doc for golang too?
>>>>>>>> > > > > > > > > >
>>>>>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>>>>>> iemejia@gmail.com>
>>>>>>>> > > > > > > > wrote:
>>>>>>>> > > > > > > > > > >
>>>>>>>> > > > > > > > > > > Huge +1
>>>>>>>> > > > > > > > > > >
>>>>>>>> > > > > > > > > > > This is definitely something many people have
>>>>>>>> asked about, so
>>>>>>>> > > > > it is
>>>>>>>> > > > > > > > > > > great to see it finally happening.
>>>>>>>> > > > > > > > > > >
>>>>>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles
>>>>>>>> <
>>>>>>>> > > > > kenn@apache.org>
>>>>>>>> > > > > > > > wrote:
>>>>>>>> > > > > > > > > > > >
>>>>>>>> > > > > > > > > > > > +1 awesome
>>>>>>>> > > > > > > > > > > >
>>>>>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke
>>>>>>>> <
>>>>>>>> > > > > lostluck@apache.org
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > > > > wrote:
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim
>>>>>>>> to get those (Go
>>>>>>>> > > > > > > > modules
>>>>>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>>>>>>> certainly
>>>>>>>> > > > > before the
>>>>>>>> > > > > > > > 2.33
>>>>>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>>>>>> process.
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>>>>>>> future, we may
>>>>>>>> > > > > want a
>>>>>>>> > > > > > > > > > harder break between a newer Generic first API
>>>>>>>> and and the
>>>>>>>> > > > > current
>>>>>>>> > > > > > > > version,
>>>>>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in
>>>>>>>> Go aren't
>>>>>>>> > > > > identical to
>>>>>>>> > > > > > > > the
>>>>>>>> > > > > > > > > > feature referred to by that term in Java, C++,
>>>>>>>> Rust, etc, so
>>>>>>>> > > > > it'll
>>>>>>>> > > > > > > > take a
>>>>>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> However, by the current nature of Go, we had
>>>>>>>> to have pretty
>>>>>>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns
>>>>>>>> and map them
>>>>>>>> > > > > to their
>>>>>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>>>>>> emitter, and
>>>>>>>> > > > > Iterator
>>>>>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go
>>>>>>>> SDK internals to
>>>>>>>> > > > > use
>>>>>>>> > > > > > > > > > generics (like the implementation of Stats DoFns
>>>>>>>> like Min, Max,
>>>>>>>> > > > > etc)
>>>>>>>> > > > > > > > would
>>>>>>>> > > > > > > > > > also be able to be made transparently to most
>>>>>>>> users, and
>>>>>>>> > > > > certainly any
>>>>>>>> > > > > > > > of
>>>>>>>> > > > > > > > > > the framework for execution time handling (the
>>>>>>>> "worker's SDK
>>>>>>>> > > > > harness")
>>>>>>>> > > > > > > > > > would be able to be cleaned up if need be.
>>>>>>>> Finally, adding more
>>>>>>>> > > > > > > > > > sophisticated DoFn registration and code
>>>>>>>> generation would be
>>>>>>>> > > > > able to
>>>>>>>> > > > > > > > > > replace the optional code generator entirely,
>>>>>>>> saving some users
>>>>>>>> > > > > a `go
>>>>>>>> > > > > > > > > > generate` step, simplifying getting improved
>>>>>>>> execution
>>>>>>>> > > > > performance.
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> Changing things like making a Type
>>>>>>>> Parameterized
>>>>>>>> > > > > PCollection,
>>>>>>>> > > > > > > > would
>>>>>>>> > > > > > > > > > be far more involved, as would trying to use some
>>>>>>>> kind of Apply
>>>>>>>> > > > > > > > format. The
>>>>>>>> > > > > > > > > > lack of Method Overrides prevents the apply
>>>>>>>> chaining approach.
>>>>>>>> > > > > Or at
>>>>>>>> > > > > > > > least
>>>>>>>> > > > > > > > > > prevents it from working simply.
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> Finally, Go Generics won't be available
>>>>>>>> until Go 1.18,
>>>>>>>> > > > > which isn't
>>>>>>>> > > > > > > > > > until next year. See
>>>>>>>> https://blog.golang.org/generics-proposal
>>>>>>>> > > > > for
>>>>>>>> > > > > > > > > > details.
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17
>>>>>>>> does include a
>>>>>>>> > > > > Register
>>>>>>>> > > > > > > > > > calling convention, leading to a modest
>>>>>>>> performance improvement
>>>>>>>> > > > > across
>>>>>>>> > > > > > > > the
>>>>>>>> > > > > > > > > > board.
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> Cheers,
>>>>>>>> > > > > > > > > > > >> Robert Burke
>>>>>>>> > > > > > > > > > > >>
>>>>>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>>>>>> > > > > robertwb@google.com>
>>>>>>>> > > > > > > > wrote:
>>>>>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>>>>>> experimental once
>>>>>>>> > > > > the Go
>>>>>>>> > > > > > > > > > Modules
>>>>>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK
>>>>>>>> needs to support
>>>>>>>> > > > > every
>>>>>>>> > > > > > > > > > feature
>>>>>>>> > > > > > > > > > > >> > to be accepted, especially now that we can
>>>>>>>> do
>>>>>>>> > > > > cross-language
>>>>>>>> > > > > > > > > > > >> > transforms, and Go definitely supports
>>>>>>>> enough to be quite
>>>>>>>> > > > > > > > useful.
>>>>>>>> > > > > > > > > > (WRT
>>>>>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>>>>>> supports the
>>>>>>>> > > > > streaming
>>>>>>>> > > > > > > > model
>>>>>>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine
>>>>>>>> on a streaming
>>>>>>>> > > > > > > > runner,
>>>>>>>> > > > > > > > > > even
>>>>>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>>>>>> timers aren't yet
>>>>>>>> > > > > > > > > > available.)
>>>>>>>> > > > > > > > > > > >> >
>>>>>>>> > > > > > > > > > > >> > This is a great milestone.
>>>>>>>> > > > > > > > > > > >> >
>>>>>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>>>>>> Hamilton <
>>>>>>>> > > > > > > > tysonjh@google.com>
>>>>>>>> > > > > > > > > > wrote:
>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>>>>>> status after Go
>>>>>>>> > > > > Modules
>>>>>>>> > > > > > > > > > are completed and the LICENSE issue is resolved.
>>>>>>>> I don't think
>>>>>>>> > > > > that
>>>>>>>> > > > > > > > lacking
>>>>>>>> > > > > > > > > > streaming support is a blocker. The other thing I
>>>>>>>> checked to see
>>>>>>>> > > > > was if
>>>>>>>> > > > > > > > > > there were metrics available on
>>>>>>>> metrics.beam.apache.org,
>>>>>>>> > > > > specifically
>>>>>>>> > > > > > > > for
>>>>>>>> > > > > > > > > > measuring code health via post-commit over time,
>>>>>>>> which there are
>>>>>>>> > > > > and
>>>>>>>> > > > > > > > the
>>>>>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one
>>>>>>>> thing that
>>>>>>>> > > > > surprised me
>>>>>>>> > > > > > > > from
>>>>>>>> > > > > > > > > > your summary is that when Go introduces generics
>>>>>>>> it won't result
>>>>>>>> > > > > in any
>>>>>>>> > > > > > > > > > backwards incompatible changes in Apache Beam.
>>>>>>>> That's great
>>>>>>>> > > > > news, but
>>>>>>>> > > > > > > > does
>>>>>>>> > > > > > > > > > it mean there will be a need to support both
>>>>>>>> non-generic and
>>>>>>>> > > > > generic
>>>>>>>> > > > > > > > APIs
>>>>>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>>>>>> introduced in the
>>>>>>>> > > > > Go
>>>>>>>> > > > > > > > 1.17
>>>>>>>> > > > > > > > > > release (optimistically) in August this year.
>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>> > > > > > > > > > > >> > >
>>>>>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert
>>>>>>>> Burke <
>>>>>>>> > > > > > > > lostluck@apache.org>
>>>>>>>> > > > > > > > > > wrote:
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache
>>>>>>>> Beam Go SDK
>>>>>>>> > > > > > > > experimental.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>>>>>> community, and any
>>>>>>>> > > > > > > > conditions
>>>>>>>> > > > > > > > > > that remain that would prevent the exit.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> tl;dr;
>>>>>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I
>>>>>>>> have both.
>>>>>>>> > > > > > > > > > > >> > >> This entails including it officially in
>>>>>>>> the Release
>>>>>>>> > > > > process,
>>>>>>>> > > > > > > > > > removing the various "experimental" text
>>>>>>>> throughout the repo etc,
>>>>>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python
>>>>>>>> and Java. Some Go
>>>>>>>> > > > > > > > specific
>>>>>>>> > > > > > > > > > tasks around dep versioning.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>>>>>> efficiently for
>>>>>>>> > > > > most
>>>>>>>> > > > > > > > batch
>>>>>>>> > > > > > > > > > tasks, including basic windowing.
>>>>>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and
>>>>>>>> are tested on all
>>>>>>>> > > > > > > > Portable
>>>>>>>> > > > > > > > > > runners.
>>>>>>>> > > > > > > > > > > >> > >> The core APIs are not going to change
>>>>>>>> in incompatible
>>>>>>>> > > > > ways
>>>>>>>> > > > > > > > going
>>>>>>>> > > > > > > > > > forward.
>>>>>>>> > > > > > > > > > > >> > >> Scalable transforms can be written
>>>>>>>> through
>>>>>>>> > > > > SplittableDoFns or
>>>>>>>> > > > > > > > > > via Cross Language transforms.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete,
>>>>>>>> but keeping it
>>>>>>>> > > > > > > > experimental
>>>>>>>> > > > > > > > > > doesn't help with that any further.
>>>>>>>> > > > > > > > > > > >> > >> Communities grow through contributions
>>>>>>>> and use, and
>>>>>>>> > > > > > > > experimental
>>>>>>>> > > > > > > > > > markers dissuade users.
>>>>>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand
>>>>>>>> what can be done
>>>>>>>> > > > > with
>>>>>>>> > > > > > > > the
>>>>>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>>>>>> Experimental, it's
>>>>>>>> > > > > > > > because
>>>>>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>>>>>> significantly.
>>>>>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work
>>>>>>>> for users of
>>>>>>>> > > > > the SDK
>>>>>>>> > > > > > > > on
>>>>>>>> > > > > > > > > > every release which leads to sticking to older
>>>>>>>> versions or
>>>>>>>> > > > > forking
>>>>>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>>>>>> should be looked
>>>>>>>> > > > > > > > forward
>>>>>>>> > > > > > > > > > to, and viewed as having little risk. Further
>>>>>>>> while there's been
>>>>>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low
>>>>>>>> bar" is for a
>>>>>>>> > > > > new
>>>>>>>> > > > > > > > SDK, it
>>>>>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I
>>>>>>>> feel this has
>>>>>>>> > > > > > > > > > > >> > >> hurt development and contribution of
>>>>>>>> new SDK languages
>>>>>>>> > > > > > > > (inherent
>>>>>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>>>>>> entirely clear
>>>>>>>> > > > > what the
>>>>>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>>>>>> language like Go.
>>>>>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>>>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0])
>>>>>>>> goes into
>>>>>>>> > > > > detail
>>>>>>>> > > > > > > > what it
>>>>>>>> > > > > > > > > > means for a language without
>>>>>>>> > > > > > > > > > > >> > >> Generics, or overloading, or
>>>>>>>> inheritance to implement
>>>>>>>> > > > > the
>>>>>>>> > > > > > > > beam
>>>>>>>> > > > > > > > > > model. One could largely throw away static types
>>>>>>>> (like Python),
>>>>>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go.
>>>>>>>> It would not do
>>>>>>>> > > > > if the
>>>>>>>> > > > > > > > > > approach couldn't grow and scale to the Beam
>>>>>>>> Model. It's also
>>>>>>>> > > > > hard
>>>>>>>> > > > > > > > > > > >> > >> to tell if an API is any good before
>>>>>>>> there are users.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Further, in the early days of
>>>>>>>> Portability, there
>>>>>>>> > > > > wasn't a
>>>>>>>> > > > > > > > way to
>>>>>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise.
>>>>>>>> It's an
>>>>>>>> > > > > incredible
>>>>>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial
>>>>>>>> fanout of work on
>>>>>>>> > > > > a
>>>>>>>> > > > > > > > single
>>>>>>>> > > > > > > > > > machine, write everything to a Reshuffle, just in
>>>>>>>> order to scale
>>>>>>>> > > > > up.
>>>>>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is
>>>>>>>> little more than
>>>>>>>> > > > > > > > overhead.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> At this point, both of these needs are
>>>>>>>> met within the
>>>>>>>> > > > > Go SDK
>>>>>>>> > > > > > > > for
>>>>>>>> > > > > > > > > > open source.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Background
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam
>>>>>>>> repo for a few
>>>>>>>> > > > > years
>>>>>>>> > > > > > > > now,
>>>>>>>> > > > > > > > > > since it was accidentally merged into master.
>>>>>>>> > > > > > > > > > > >> > >> Since then it's been called
>>>>>>>> experimental, and not
>>>>>>>> > > > > officially
>>>>>>>> > > > > > > > > > part of the releases.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>>>>>> around Beam
>>>>>>>> > > > > Portability
>>>>>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>>>>>> specific )
>>>>>>>> > > > > workers.
>>>>>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline
>>>>>>>> protos and FnAPI to
>>>>>>>> > > > > > > > execute
>>>>>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>>>>>> Dataflow, but now
>>>>>>>> > > > > > > > > > > >> > >> on all portable supported runners, like
>>>>>>>> Flink, Spark,
>>>>>>>> > > > > the
>>>>>>>> > > > > > > > Python
>>>>>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> API Stability
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed
>>>>>>>> it's user API
>>>>>>>> > > > > for DoFn
>>>>>>>> > > > > > > > > > and pipeline construction since it was first
>>>>>>>> merged in, and
>>>>>>>> > > > > there are
>>>>>>>> > > > > > > > no
>>>>>>>> > > > > > > > > > > >> > >> changes to that on the horizon that
>>>>>>>> can't be made in a
>>>>>>>> > > > > > > > backwards
>>>>>>>> > > > > > > > > > compatible manner. Largely these are related to
>>>>>>>> New Features, or
>>>>>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>>>>>> advent of Go
>>>>>>>> > > > > Generics
>>>>>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>>>>>>>> largely been
>>>>>>>> > > > > under
>>>>>>>> > > > > > > > work
>>>>>>>> > > > > > > > > > for use within Google. It's use is called
>>>>>>>> FlumeGo, representing
>>>>>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top
>>>>>>>> of Flume,
>>>>>>>> > > > > Google's
>>>>>>>> > > > > > > > batch
>>>>>>>> > > > > > > > > > pipeline processing engine. Thus most of the
>>>>>>>> focus on improving
>>>>>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>>>>>>>> today, and
>>>>>>>> > > > > there
>>>>>>>> > > > > > > > hasn't
>>>>>>>> > > > > > > > > > been a call for fundamental changes to the API
>>>>>>>> for ergonomic or
>>>>>>>> > > > > > > > > > > >> > >> usability concerns.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Scalability
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Google could get away without the Go
>>>>>>>> SDK having an SDK
>>>>>>>> > > > > side
>>>>>>>> > > > > > > > > > scalability solution as a result of it's
>>>>>>>> integration with Flume.
>>>>>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns
>>>>>>>> along with
>>>>>>>> > > > > Dynamic
>>>>>>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>>>>>>> transforms
>>>>>>>> > > > > natively
>>>>>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>>>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>>>>>> Transforms, with
>>>>>>>> > > > > Beam
>>>>>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>>>>>> transforms
>>>>>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who
>>>>>>>> implemented the SDF
>>>>>>>> > > > > side
>>>>>>>> > > > > > > > work,
>>>>>>>> > > > > > > > > > and completed the Xlang work,) is adding a
>>>>>>>> wrapper for the
>>>>>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>>>>>> Transforms, which
>>>>>>>> > > > > is often
>>>>>>>> > > > > > > > > > been requested. This will also enable use of the
>>>>>>>> Beam SQL
>>>>>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Features
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core.
>>>>>>>> The Go SDK
>>>>>>>> > > > > implements
>>>>>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>>>>>> CombineFns and access
>>>>>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>>>>>> GroupByKey, and
>>>>>>>> > > > > features
>>>>>>>> > > > > > > > like
>>>>>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>>>>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported
>>>>>>>> for batch even
>>>>>>>> > > > > > > > through
>>>>>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>>>>>>> versatile for
>>>>>>>> > > > > batch
>>>>>>>> > > > > > > > > > execution on portable runners, and for simple
>>>>>>>> streaming
>>>>>>>> > > > > pipelines.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Repo Testing
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's
>>>>>>>> unit tests. On
>>>>>>>> > > > > top of
>>>>>>>> > > > > > > > > > that, it runs all it's integration tests against
>>>>>>>> the Python
>>>>>>>> > > > > Portable
>>>>>>>> > > > > > > > runner,
>>>>>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>>>>>> breaking changes
>>>>>>>> > > > > without
>>>>>>>> > > > > > > > > > overspending community resources. Those same
>>>>>>>> tests are also
>>>>>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>>>>>> runners via the
>>>>>>>> > > > > > > > appropriate
>>>>>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>>>>>> management server),
>>>>>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up
>>>>>>>> runner
>>>>>>>> > > > > instances for
>>>>>>>> > > > > > > > > > you). Documentation for executing tests and
>>>>>>>> adding new ones
>>>>>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible
>>>>>>>> to Go
>>>>>>>> > > > > developers as
>>>>>>>> > > > > > > > > > they're implemented with the standard Go testing
>>>>>>>> tools.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Shortcomings
>>>>>>>> > > > > > > > > > > >> > >> That said, there's still much to do.
>>>>>>>> Let me briefly
>>>>>>>> > > > > tell you
>>>>>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>>>>>> whether they block
>>>>>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>>>>>> implemented as
>>>>>>>> > > > > Splittable
>>>>>>>> > > > > > > > > > DoFn.
>>>>>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it
>>>>>>>> will serve as
>>>>>>>> > > > > a the
>>>>>>>> > > > > > > > > > first example for future contributions for
>>>>>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>>>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at
>>>>>>>> this point
>>>>>>>> > > > > users are
>>>>>>>> > > > > > > > > > empowered to write their own DoFns or wrap
>>>>>>>> existing transforms
>>>>>>>> > > > > for
>>>>>>>> > > > > > > > Cross
>>>>>>>> > > > > > > > > > Language use.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>>>>>>>> features have
>>>>>>>> > > > > yet to
>>>>>>>> > > > > > > > be
>>>>>>>> > > > > > > > > > implemented, but they're largely additions to
>>>>>>>> what exists already
>>>>>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the
>>>>>>>> work is
>>>>>>>> > > > > definining
>>>>>>>> > > > > > > > how a
>>>>>>>> > > > > > > > > > user specifies their desires, and turning those
>>>>>>>> into the
>>>>>>>> > > > > appropriate
>>>>>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back
>>>>>>>> in October I
>>>>>>>> > > > > wrote at
>>>>>>>> > > > > > > > > > length on the wiki [1] what's missing for
>>>>>>>> additional streaming
>>>>>>>> > > > > > > > features.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>>>>>> recently, there's
>>>>>>>> > > > > likely
>>>>>>>> > > > > > > > > > still more we could test to improve our
>>>>>>>> confidence in the SDK,
>>>>>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>>>>>> transforms
>>>>>>>> > > > > libraries and
>>>>>>>> > > > > > > > > > examples.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Moving Forward
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>>>>>> incorporating the Go
>>>>>>>> > > > > SDK
>>>>>>>> > > > > > > > fully
>>>>>>>> > > > > > > > > > into the Beam Programming Guide. I've audited the
>>>>>>>> guide [3], and
>>>>>>>> > > > > > > > > > > >> > >> am beginning to add missing content and
>>>>>>>> filling in the
>>>>>>>> > > > > Go
>>>>>>>> > > > > > > > > > specific gaps. This will be tied to improving the
>>>>>>>> Go Doc with
>>>>>>>> > > > > more Go
>>>>>>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>>>>>>> appropriate for
>>>>>>>> > > > > the
>>>>>>>> > > > > > > > BPG.
>>>>>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around
>>>>>>>> the public
>>>>>>>> > > > > display of
>>>>>>>> > > > > > > > > > that GoDoc.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a
>>>>>>>> binding vote, I will
>>>>>>>> > > > > > > > > > incorporate the SDK into the release process, and
>>>>>>>> remove the
>>>>>>>> > > > > > > > "experimental"
>>>>>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>>>>>> entails updating
>>>>>>>> > > > > the
>>>>>>>> > > > > > > > > > release scripts to also build and publish the Go
>>>>>>>> SDK Docker
>>>>>>>> > > > > containers.
>>>>>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>>>>>> technically already
>>>>>>>> > > > > doing so
>>>>>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>>>>>>>> however will be
>>>>>>>> > > > > > > > > > migrating the SDK to use Go Modules for
>>>>>>>> dependency version
>>>>>>>> > > > > control,
>>>>>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on
>>>>>>>> after his Kafka
>>>>>>>> > > > > task.
>>>>>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>>>>>> contributors, and
>>>>>>>> > > > > users
>>>>>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>>>>>> dependency
>>>>>>>> > > > > management.
>>>>>>>> > > > > > > > It
>>>>>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on
>>>>>>>> the
>>>>>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions
>>>>>>>> you might have
>>>>>>>> > > > > about
>>>>>>>> > > > > > > > the
>>>>>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>>>>>> intentionally
>>>>>>>> > > > > avoided
>>>>>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they
>>>>>>>> can distract
>>>>>>>> > > > > from the
>>>>>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we
>>>>>>>> need to tell
>>>>>>>> > > > > them that
>>>>>>>> > > > > > > > they
>>>>>>>> > > > > > > > > > can
>>>>>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> Robert Burke
>>>>>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>>>>>> > > > > > > > > > > >> > >>
>>>>>>>> > > > > > > > > > > >> > >> [0]
>>>>>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>>> > > > > > > > > > > >> > >> [1]
>>>>>>>> > > > > > > > > >
>>>>>>>> > > > > > > >
>>>>>>>> > > > >
>>>>>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>>>>>> > > > > > > > > > > >> > >> [2]
>>>>>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>>>>>> > > > > > > > > > > >> > >> [3]
>>>>>>>> > > > > > > > > >
>>>>>>>> > > > > > > >
>>>>>>>> > > > >
>>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>>> > > > > > > > > > (SDK Audit sheet)
>>>>>>>> > > > > > > > > > > >> > >> [4]
>>>>>>>> > > > > > > > > >
>>>>>>>> > > > > > > >
>>>>>>>> > > > >
>>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>>>>>> > > > > > > > > > > >> >
>>>>>>>> > > > > > > > > >
>>>>>>>> > > > > > > > >
>>>>>>>> > > > > > > >
>>>>>>>> > > > > > >
>>>>>>>> > > > > >
>>>>>>>> > > > >
>>>>>>>> > > >
>>>>>>>> > >
>>>>>>>>
>>>>>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <ro...@frantil.com>.
I think so.

My tweet [1] on the topic got a bit of traction even without the official
beam account boosting it.

[1]
https://twitter.com/lostluck/status/1456720240092467200?t=owVJd6ZuTVMUkNyvYNr4Xg&s=19

On Wed, Nov 24, 2021, 10:11 AM Ahmet Altay <al...@google.com> wrote:

> Thank you Rebo, and congratulations to everyone working on Go SDK :)
>
> @Robert Burke <re...@google.com> @Brittany Hermann <he...@google.com> @Sachin
> Agarwal <sa...@google.com> - Should we share this on Beam's twitter
> and other social media pages?
>
> On Fri, Nov 5, 2021 at 1:29 PM Robert Burke <ro...@frantil.com> wrote:
>
>> It's my great pleasure to announce that the Apache Beam Go SDK is no
>> longer experimental. https://beam.apache.org/blog/go-sdk-release/
>>
>> Thank you everyone.
>> Robert Burke
>> Beam Go Busybody
>>
>> On Thu, Nov 4, 2021, 6:29 PM Robert Burke <ro...@frantil.com> wrote:
>>
>>> At this point I just need an LGTM on the blog post PR, as the draft is
>>> finalized.
>>>
>>> Udi added the sdks/v2.33.0 tag which works as expected. I've also
>>> verified that the appropriate container is used by default when not
>>> specified which is the last unknown in this process.
>>>
>>> Who's ready to release a new SDK? I am!
>>>
>>>  https://github.com/apache/beam/pull/15894 (or join the exciting
>>> reaction emoji on the top post).
>>>
>>>
>>>
>>> On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:
>>>
>>>> The current draft of the exit blog post is
>>>> https://github.com/apache/beam/pull/15894
>>>> Comments are very welcome. I'm going to continue looking for Known
>>>> issues (which will be linked to their respective JIRAs) tomorrow.
>>>>
>>>> Since RC1 is getting cycled, I can also go back to the original plan of
>>>> v2.33.0, if we'd like to get it out this week.
>>>>
>>>>
>>>> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:
>>>>
>>>>> Investigation yielded that there's no way around the prefixed tags.
>>>>> The JIRA has been commented with the explanation.
>>>>>
>>>>> https://github.com/apache/beam/pull/15881 has the release script
>>>>> updates.
>>>>>
>>>>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The
>>>>> draft PR will be linked here.
>>>>>
>>>>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
>>>>> inclined to wait for that release to finish before publishing the blogpost.
>>>>> I'll link the draft PR here as soon as it's ready.
>>>>>
>>>>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also
>>>>> prefix tagged so there isn't a gap in versions between the unmoduled code
>>>>> and moduled code.
>>>>>
>>>>> Once published,  that'll be the end of this thread.
>>>>>
>>>>> Thank you very much everyone.
>>>>>
>>>>> Robert Burke
>>>>> Beam Go Busybody
>>>>>
>>>>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com> wrote:
>>>>>
>>>>>> +1 to extra tags. They'll be trivial to add to our release process,
>>>>>> and git tags are lightweight by design so I don't foresee any problems.
>>>>>>
>>>>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Glad you were able to figure it out. The extra tags are certainly
>>>>>>> worth making this work if it's what we have to do, and shouldn't be
>>>>>>> too much of a problem (until, hopefully, it's fixed on the go side).
>>>>>>>
>>>>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>>>>> wrote:
>>>>>>> >
>>>>>>> > With Kyle's help with the additional tagging of the next RC, we
>>>>>>> have validated that this is the currently correct approach.
>>>>>>> >
>>>>>>> >
>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>>>>> >
>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>>>>> >
>>>>>>> > Or even:
>>>>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>>>>> (links to latest tagged version)
>>>>>>> >
>>>>>>> > The main cost to this approach is doubling the number of tags in
>>>>>>> the tags list: https://github.com/apache/beam/tags which is not
>>>>>>> ideal, but overall a small cost. There's no need for "full publish" of
>>>>>>> these additional tags, so we won't be doubling our "releases" (see
>>>>>>> https://github.com/apache/beam/releases).
>>>>>>> >
>>>>>>> > I'll still be filing a bug against the Go commands since the
>>>>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>>>>> so, we can always delete the tags from the affected branches, and cease the
>>>>>>> behavior going forward. I'll search through the existing Go issues first
>>>>>>> however to see if this has been previously discussed, and report my
>>>>>>> findings here either way.
>>>>>>> >
>>>>>>> > This does require 2 small changes to release guide: The rc tagging
>>>>>>> script, and the finally tagging:
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>>>>> >
>>>>>>> >
>>>>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>>>>> >
>>>>>>> > I'll make this change later this week (or early next) assuming
>>>>>>> there are no objections.
>>>>>>> >
>>>>>>> > Thank you all very much for your patience,
>>>>>>> > Robert Burke
>>>>>>> > Beam Go Busybody
>>>>>>> >
>>>>>>> >
>>>>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>>>>>>> > > With much research in reading the Go Modules documentation, I
>>>>>>> have confirmed what the issue is.
>>>>>>> > >
>>>>>>> > > We added the go.mod file to sdks/ under the repo root because
>>>>>>> it's a cleaner spot for the change, captures the Java and Python container
>>>>>>> boot code (written in Go) into the module and avoids conflicts in
>>>>>>> interpretations of the vendor directory that lives at the root level.
>>>>>>> > >
>>>>>>> > > However, we missed that when doing so, the standard version tags
>>>>>>> would only apply to modules at the root level, not at modules in
>>>>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>>>>> quoting the important paragraph:
>>>>>>> > >
>>>>>>> > > > If a module is defined in a subdirectory within the
>>>>>>> repository, that is, the module subdirectory portion of
>>>>>>> > > > the module path is not empty, then each tag name must be
>>>>>>> prefixed with the module subdirectory,
>>>>>>> > > > followed by a slash. For example, the module
>>>>>>> golang.org/x/tools/gopls is defined in the gopls
>>>>>>> > > > subdirectory of the repository with root path
>>>>>>> golang.org/x/tools. The version v0.4.0 of that module must > have
>>>>>>> the tag named gopls/v0.4.0 in that repository.
>>>>>>> > >
>>>>>>> > > Specifically, for the Go SDK to be able to be fetched at the
>>>>>>> right version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>>>>> "sdks/v2.34.0-RC1"
>>>>>>> > >
>>>>>>> > > So, the fix for the Go versioning issue is to amend our Release
>>>>>>> process (including generating Release Candidate builds) to also add a
>>>>>>> prefixed version tag with the same version.
>>>>>>> > >
>>>>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if
>>>>>>> there are no objections we can back update the 2.33.0 release branch with
>>>>>>> such a prefixed tag. At which point I can also write the Official
>>>>>>> Experiemental Exit Blog post.
>>>>>>> > >
>>>>>>> > > Thank you all for your patience.
>>>>>>> > > Robert Burke
>>>>>>> > >
>>>>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>>>>> > > > Thank you for the detailed update! Let us know if we can help.
>>>>>>> > > >
>>>>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <
>>>>>>> lostluck@apache.org> wrote:
>>>>>>> > > >
>>>>>>> > > > > This is a status update.
>>>>>>> > > > >
>>>>>>> > > > > At this point 2.33.0 is released, but there are difficulties
>>>>>>> with
>>>>>>> > > > > accessing the tagged versions using the standard go tools.
>>>>>>> It's currently
>>>>>>> > > > > under investigation.
>>>>>>> > > > >
>>>>>>> > > > > Using the v2 path in a go program then running `go mod tidy`
>>>>>>> will populate
>>>>>>> > > > > the file with  a pseudo-version rather than the latest tag
>>>>>>> (v2.33.0)  (eg
>>>>>>> > > > > the line looks like
>>>>>>> > > > > require github.com/apache/beam/sdks/v2
>>>>>>> v2.0.0-20211013181004-a9120e083008
>>>>>>> > > > > )
>>>>>>> > > > >
>>>>>>> > > > > While this will work, it's not the desired experience for
>>>>>>> users at this
>>>>>>> > > > > point. Current downside is that the releases are not
>>>>>>> meaningful targets for
>>>>>>> > > > > some reason. However, we retain the other benefits of Go
>>>>>>> Modules (actual
>>>>>>> > > > > dependency versioning, management by go tools).
>>>>>>> > > > >
>>>>>>> > > > > The issue is some combination of the go tooling [A] , that
>>>>>>> we added a go
>>>>>>> > > > > mod file outside of the repo root [B], and that we did not
>>>>>>> increment the
>>>>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>>>>> > > > >
>>>>>>> > > > > [B] From the go documentation, this should be legal and
>>>>>>> fine, even if it's
>>>>>>> > > > > not recommended. This is fortunate because the root of the
>>>>>>> repo would have
>>>>>>> > > > > played poorly with root vendor directory, which the go tools
>>>>>>> have opinions
>>>>>>> > > > > on.
>>>>>>> > > > >
>>>>>>> > > > > [C] Incrementing the major version is recommended,in the Go
>>>>>>> Modules
>>>>>>> > > > > documentation, when transitioning to Go Modules. However, it
>>>>>>> never said it
>>>>>>> > > > > was required, nor did it indicate this current failure mode.
>>>>>>> If anything
>>>>>>> > > > > this should be documented in those docs, if it's not another
>>>>>>> bug. We would
>>>>>>> > > > > not necessarily want to declare a global v3 for beam at this
>>>>>>> time, for just
>>>>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>>>>> Notionally there are
>>>>>>> > > > > some larger breaking changes the Java and Python SDKs would
>>>>>>> want to make in
>>>>>>> > > > > such an event, and thus it's a larger conversation, that is
>>>>>>> out of scope at
>>>>>>> > > > > this time.
>>>>>>> > > > >
>>>>>>> > > > > This leaves [A] where some mis-understanding of the
>>>>>>> documented semantics
>>>>>>> > > > > occurred. I certainly expected the tagged version of the
>>>>>>> non-root go-module
>>>>>>> > > > > to be inherited from the parent, not wholesale ignored. As a
>>>>>>> result, I'll
>>>>>>> > > > > be filing a bug against the go tools to determine this, and
>>>>>>> see what paths
>>>>>>> > > > > forward exist.
>>>>>>> > > > >
>>>>>>> > > > > It's my hope to resolve this before we write a properly
>>>>>>> Experimental Exit
>>>>>>> > > > > blog post for the Go SDK.
>>>>>>> > > > >
>>>>>>> > > > > Thank you for your patience, and time.
>>>>>>> > > > > Robert Burke
>>>>>>> > > > > Beam Go Busybody
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > >
>>>>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>>>>> wrote:
>>>>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the
>>>>>>> SDK now uses Go
>>>>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>>>>> contributions. [2]
>>>>>>> > > > > >
>>>>>>> > > > > > The Module file lives in the sdks/ directory so there's a
>>>>>>> single Go
>>>>>>> > > > > Module for the whole SDK, tests, examples, and any support
>>>>>>> code for the
>>>>>>> > > > > container boot builds. This excludes the Go SDK Code katas
>>>>>>> [3] go modules
>>>>>>> > > > > which can be updated once 2.33.0 has been released.
>>>>>>> > > > > >
>>>>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>>>>>> builds, and
>>>>>>> > > > > default uses the release specific container for docker
>>>>>>> execution jobs. For
>>>>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>>>>> validation will
>>>>>>> > > > > need to explictly specify RC versions of containers.
>>>>>>> However, given that
>>>>>>> > > > > the Go SDK container and worker boot process rarely changes,
>>>>>>> this is
>>>>>>> > > > > unlikely to be an issue.
>>>>>>> > > > > >
>>>>>>> > > > > > At present I'm cleaning up some of the references to
>>>>>>> experimental, and
>>>>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>>>>> release (even
>>>>>>> > > > > though that's 4-6 weeks out from actual release.)
>>>>>>> CHANGES.md  will be
>>>>>>> > > > > updated to note the event, but a larger blogpost will happen
>>>>>>> after the
>>>>>>> > > > > release goes public.
>>>>>>> > > > > >
>>>>>>> > > > > > Cheers,
>>>>>>> > > > > > Robert Burke
>>>>>>> > > > > > Defacto Beam Go TL.
>>>>>>> > > > > >
>>>>>>> > > > > > [1]
>>>>>>> > > > >
>>>>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>>>>> > > > > > [3]
>>>>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>>>>> > > > > >
>>>>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com>
>>>>>>> wrote:
>>>>>>> > > > > > > +1, congratulations & thank you!
>>>>>>> > > > > > >
>>>>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>>>>> lostluck@apache.org>
>>>>>>> > > > > wrote:
>>>>>>> > > > > > >
>>>>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes
>>>>>>> up to section
>>>>>>> > > > > ~4.3.
>>>>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>>>>> > > > > > > >
>>>>>>> > > > > > > >
>>>>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <
>>>>>>> robert@frantil.com> wrote:
>>>>>>> > > > > > > > > Yup!
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > > > My immediate plan is to work on incorporating the Go
>>>>>>> SDK fully
>>>>>>> > > > > into the
>>>>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>>>>>> > > > > > > > > am beginning to add missing content and filling in
>>>>>>> the Go specific
>>>>>>> > > > > gaps.
>>>>>>> > > > > > > > > This will be tied to improving the Go Doc with more
>>>>>>> Go
>>>>>>> > > > > > > > > specific user documentation that isn't appropriate
>>>>>>> for the BPG.
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > > > My audit of the guide is here:
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > >
>>>>>>> > > > >
>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>>>>> feature page
>>>>>>> > > > > looks
>>>>>>> > > > > > > > worse
>>>>>>> > > > > > > > > than it is, as it was more productive to focus on
>>>>>>> what isn't
>>>>>>> > > > > available
>>>>>>> > > > > > > > than
>>>>>>> > > > > > > > > what is. That's a snapshot of my actual working
>>>>>>> sheet but I'll be
>>>>>>> > > > > > > > updating
>>>>>>> > > > > > > > > it as needed.
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>>>>> iemejia@gmail.com>
>>>>>>> > > > > wrote:
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > > > > Oups forgot to write one question. Will this come
>>>>>>> with revamped
>>>>>>> > > > > > > > > > website instructions/doc for golang too?
>>>>>>> > > > > > > > > >
>>>>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>>>>> iemejia@gmail.com>
>>>>>>> > > > > > > > wrote:
>>>>>>> > > > > > > > > > >
>>>>>>> > > > > > > > > > > Huge +1
>>>>>>> > > > > > > > > > >
>>>>>>> > > > > > > > > > > This is definitely something many people have
>>>>>>> asked about, so
>>>>>>> > > > > it is
>>>>>>> > > > > > > > > > > great to see it finally happening.
>>>>>>> > > > > > > > > > >
>>>>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
>>>>>>> > > > > kenn@apache.org>
>>>>>>> > > > > > > > wrote:
>>>>>>> > > > > > > > > > > >
>>>>>>> > > > > > > > > > > > +1 awesome
>>>>>>> > > > > > > > > > > >
>>>>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
>>>>>>> > > > > lostluck@apache.org
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > > > > wrote:
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim
>>>>>>> to get those (Go
>>>>>>> > > > > > > > modules
>>>>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>>>>>> certainly
>>>>>>> > > > > before the
>>>>>>> > > > > > > > 2.33
>>>>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>>>>> process.
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>>>>>> future, we may
>>>>>>> > > > > want a
>>>>>>> > > > > > > > > > harder break between a newer Generic first API and
>>>>>>> and the
>>>>>>> > > > > current
>>>>>>> > > > > > > > version,
>>>>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go
>>>>>>> aren't
>>>>>>> > > > > identical to
>>>>>>> > > > > > > > the
>>>>>>> > > > > > > > > > feature referred to by that term in Java, C++,
>>>>>>> Rust, etc, so
>>>>>>> > > > > it'll
>>>>>>> > > > > > > > take a
>>>>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> However, by the current nature of Go, we had
>>>>>>> to have pretty
>>>>>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns
>>>>>>> and map them
>>>>>>> > > > > to their
>>>>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>>>>> emitter, and
>>>>>>> > > > > Iterator
>>>>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
>>>>>>> internals to
>>>>>>> > > > > use
>>>>>>> > > > > > > > > > generics (like the implementation of Stats DoFns
>>>>>>> like Min, Max,
>>>>>>> > > > > etc)
>>>>>>> > > > > > > > would
>>>>>>> > > > > > > > > > also be able to be made transparently to most
>>>>>>> users, and
>>>>>>> > > > > certainly any
>>>>>>> > > > > > > > of
>>>>>>> > > > > > > > > > the framework for execution time handling (the
>>>>>>> "worker's SDK
>>>>>>> > > > > harness")
>>>>>>> > > > > > > > > > would be able to be cleaned up if need be.
>>>>>>> Finally, adding more
>>>>>>> > > > > > > > > > sophisticated DoFn registration and code
>>>>>>> generation would be
>>>>>>> > > > > able to
>>>>>>> > > > > > > > > > replace the optional code generator entirely,
>>>>>>> saving some users
>>>>>>> > > > > a `go
>>>>>>> > > > > > > > > > generate` step, simplifying getting improved
>>>>>>> execution
>>>>>>> > > > > performance.
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> Changing things like making a Type
>>>>>>> Parameterized
>>>>>>> > > > > PCollection,
>>>>>>> > > > > > > > would
>>>>>>> > > > > > > > > > be far more involved, as would trying to use some
>>>>>>> kind of Apply
>>>>>>> > > > > > > > format. The
>>>>>>> > > > > > > > > > lack of Method Overrides prevents the apply
>>>>>>> chaining approach.
>>>>>>> > > > > Or at
>>>>>>> > > > > > > > least
>>>>>>> > > > > > > > > > prevents it from working simply.
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> Finally, Go Generics won't be available until
>>>>>>> Go 1.18,
>>>>>>> > > > > which isn't
>>>>>>> > > > > > > > > > until next year. See
>>>>>>> https://blog.golang.org/generics-proposal
>>>>>>> > > > > for
>>>>>>> > > > > > > > > > details.
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17
>>>>>>> does include a
>>>>>>> > > > > Register
>>>>>>> > > > > > > > > > calling convention, leading to a modest
>>>>>>> performance improvement
>>>>>>> > > > > across
>>>>>>> > > > > > > > the
>>>>>>> > > > > > > > > > board.
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> Cheers,
>>>>>>> > > > > > > > > > > >> Robert Burke
>>>>>>> > > > > > > > > > > >>
>>>>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>>>>> > > > > robertwb@google.com>
>>>>>>> > > > > > > > wrote:
>>>>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>>>>> experimental once
>>>>>>> > > > > the Go
>>>>>>> > > > > > > > > > Modules
>>>>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK
>>>>>>> needs to support
>>>>>>> > > > > every
>>>>>>> > > > > > > > > > feature
>>>>>>> > > > > > > > > > > >> > to be accepted, especially now that we can
>>>>>>> do
>>>>>>> > > > > cross-language
>>>>>>> > > > > > > > > > > >> > transforms, and Go definitely supports
>>>>>>> enough to be quite
>>>>>>> > > > > > > > useful.
>>>>>>> > > > > > > > > > (WRT
>>>>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>>>>> supports the
>>>>>>> > > > > streaming
>>>>>>> > > > > > > > model
>>>>>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine
>>>>>>> on a streaming
>>>>>>> > > > > > > > runner,
>>>>>>> > > > > > > > > > even
>>>>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>>>>> timers aren't yet
>>>>>>> > > > > > > > > > available.)
>>>>>>> > > > > > > > > > > >> >
>>>>>>> > > > > > > > > > > >> > This is a great milestone.
>>>>>>> > > > > > > > > > > >> >
>>>>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>>>>> Hamilton <
>>>>>>> > > > > > > > tysonjh@google.com>
>>>>>>> > > > > > > > > > wrote:
>>>>>>> > > > > > > > > > > >> > >
>>>>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>>>>> > > > > > > > > > > >> > >
>>>>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>>>>> status after Go
>>>>>>> > > > > Modules
>>>>>>> > > > > > > > > > are completed and the LICENSE issue is resolved. I
>>>>>>> don't think
>>>>>>> > > > > that
>>>>>>> > > > > > > > lacking
>>>>>>> > > > > > > > > > streaming support is a blocker. The other thing I
>>>>>>> checked to see
>>>>>>> > > > > was if
>>>>>>> > > > > > > > > > there were metrics available on
>>>>>>> metrics.beam.apache.org,
>>>>>>> > > > > specifically
>>>>>>> > > > > > > > for
>>>>>>> > > > > > > > > > measuring code health via post-commit over time,
>>>>>>> which there are
>>>>>>> > > > > and
>>>>>>> > > > > > > > the
>>>>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing
>>>>>>> that
>>>>>>> > > > > surprised me
>>>>>>> > > > > > > > from
>>>>>>> > > > > > > > > > your summary is that when Go introduces generics
>>>>>>> it won't result
>>>>>>> > > > > in any
>>>>>>> > > > > > > > > > backwards incompatible changes in Apache Beam.
>>>>>>> That's great
>>>>>>> > > > > news, but
>>>>>>> > > > > > > > does
>>>>>>> > > > > > > > > > it mean there will be a need to support both
>>>>>>> non-generic and
>>>>>>> > > > > generic
>>>>>>> > > > > > > > APIs
>>>>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>>>>> introduced in the
>>>>>>> > > > > Go
>>>>>>> > > > > > > > 1.17
>>>>>>> > > > > > > > > > release (optimistically) in August this year.
>>>>>>> > > > > > > > > > > >> > >
>>>>>>> > > > > > > > > > > >> > >
>>>>>>> > > > > > > > > > > >> > >
>>>>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert
>>>>>>> Burke <
>>>>>>> > > > > > > > lostluck@apache.org>
>>>>>>> > > > > > > > > > wrote:
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache
>>>>>>> Beam Go SDK
>>>>>>> > > > > > > > experimental.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>>>>> community, and any
>>>>>>> > > > > > > > conditions
>>>>>>> > > > > > > > > > that remain that would prevent the exit.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> tl;dr;
>>>>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I
>>>>>>> have both.
>>>>>>> > > > > > > > > > > >> > >> This entails including it officially in
>>>>>>> the Release
>>>>>>> > > > > process,
>>>>>>> > > > > > > > > > removing the various "experimental" text
>>>>>>> throughout the repo etc,
>>>>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python
>>>>>>> and Java. Some Go
>>>>>>> > > > > > > > specific
>>>>>>> > > > > > > > > > tasks around dep versioning.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>>>>> efficiently for
>>>>>>> > > > > most
>>>>>>> > > > > > > > batch
>>>>>>> > > > > > > > > > tasks, including basic windowing.
>>>>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
>>>>>>> tested on all
>>>>>>> > > > > > > > Portable
>>>>>>> > > > > > > > > > runners.
>>>>>>> > > > > > > > > > > >> > >> The core APIs are not going to change in
>>>>>>> incompatible
>>>>>>> > > > > ways
>>>>>>> > > > > > > > going
>>>>>>> > > > > > > > > > forward.
>>>>>>> > > > > > > > > > > >> > >> Scalable transforms can be written
>>>>>>> through
>>>>>>> > > > > SplittableDoFns or
>>>>>>> > > > > > > > > > via Cross Language transforms.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
>>>>>>> keeping it
>>>>>>> > > > > > > > experimental
>>>>>>> > > > > > > > > > doesn't help with that any further.
>>>>>>> > > > > > > > > > > >> > >> Communities grow through contributions
>>>>>>> and use, and
>>>>>>> > > > > > > > experimental
>>>>>>> > > > > > > > > > markers dissuade users.
>>>>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand
>>>>>>> what can be done
>>>>>>> > > > > with
>>>>>>> > > > > > > > the
>>>>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>>>>> Experimental, it's
>>>>>>> > > > > > > > because
>>>>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>>>>> significantly.
>>>>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work
>>>>>>> for users of
>>>>>>> > > > > the SDK
>>>>>>> > > > > > > > on
>>>>>>> > > > > > > > > > every release which leads to sticking to older
>>>>>>> versions or
>>>>>>> > > > > forking
>>>>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>>>>> should be looked
>>>>>>> > > > > > > > forward
>>>>>>> > > > > > > > > > to, and viewed as having little risk. Further
>>>>>>> while there's been
>>>>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low
>>>>>>> bar" is for a
>>>>>>> > > > > new
>>>>>>> > > > > > > > SDK, it
>>>>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I
>>>>>>> feel this has
>>>>>>> > > > > > > > > > > >> > >> hurt development and contribution of new
>>>>>>> SDK languages
>>>>>>> > > > > > > > (inherent
>>>>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>>>>> entirely clear
>>>>>>> > > > > what the
>>>>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>>>>> language like Go.
>>>>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0])
>>>>>>> goes into
>>>>>>> > > > > detail
>>>>>>> > > > > > > > what it
>>>>>>> > > > > > > > > > means for a language without
>>>>>>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance
>>>>>>> to implement
>>>>>>> > > > > the
>>>>>>> > > > > > > > beam
>>>>>>> > > > > > > > > > model. One could largely throw away static types
>>>>>>> (like Python),
>>>>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go.
>>>>>>> It would not do
>>>>>>> > > > > if the
>>>>>>> > > > > > > > > > approach couldn't grow and scale to the Beam
>>>>>>> Model. It's also
>>>>>>> > > > > hard
>>>>>>> > > > > > > > > > > >> > >> to tell if an API is any good before
>>>>>>> there are users.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Further, in the early days of
>>>>>>> Portability, there
>>>>>>> > > > > wasn't a
>>>>>>> > > > > > > > way to
>>>>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise.
>>>>>>> It's an
>>>>>>> > > > > incredible
>>>>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial
>>>>>>> fanout of work on
>>>>>>> > > > > a
>>>>>>> > > > > > > > single
>>>>>>> > > > > > > > > > machine, write everything to a Reshuffle, just in
>>>>>>> order to scale
>>>>>>> > > > > up.
>>>>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is
>>>>>>> little more than
>>>>>>> > > > > > > > overhead.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> At this point, both of these needs are
>>>>>>> met within the
>>>>>>> > > > > Go SDK
>>>>>>> > > > > > > > for
>>>>>>> > > > > > > > > > open source.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Background
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam
>>>>>>> repo for a few
>>>>>>> > > > > years
>>>>>>> > > > > > > > now,
>>>>>>> > > > > > > > > > since it was accidentally merged into master.
>>>>>>> > > > > > > > > > > >> > >> Since then it's been called
>>>>>>> experimental, and not
>>>>>>> > > > > officially
>>>>>>> > > > > > > > > > part of the releases.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>>>>> around Beam
>>>>>>> > > > > Portability
>>>>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>>>>> specific )
>>>>>>> > > > > workers.
>>>>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline
>>>>>>> protos and FnAPI to
>>>>>>> > > > > > > > execute
>>>>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>>>>> Dataflow, but now
>>>>>>> > > > > > > > > > > >> > >> on all portable supported runners, like
>>>>>>> Flink, Spark,
>>>>>>> > > > > the
>>>>>>> > > > > > > > Python
>>>>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> API Stability
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed
>>>>>>> it's user API
>>>>>>> > > > > for DoFn
>>>>>>> > > > > > > > > > and pipeline construction since it was first
>>>>>>> merged in, and
>>>>>>> > > > > there are
>>>>>>> > > > > > > > no
>>>>>>> > > > > > > > > > > >> > >> changes to that on the horizon that
>>>>>>> can't be made in a
>>>>>>> > > > > > > > backwards
>>>>>>> > > > > > > > > > compatible manner. Largely these are related to
>>>>>>> New Features, or
>>>>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>>>>> advent of Go
>>>>>>> > > > > Generics
>>>>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>>>>>>> largely been
>>>>>>> > > > > under
>>>>>>> > > > > > > > work
>>>>>>> > > > > > > > > > for use within Google. It's use is called FlumeGo,
>>>>>>> representing
>>>>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top
>>>>>>> of Flume,
>>>>>>> > > > > Google's
>>>>>>> > > > > > > > batch
>>>>>>> > > > > > > > > > pipeline processing engine. Thus most of the focus
>>>>>>> on improving
>>>>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>>>>>>> today, and
>>>>>>> > > > > there
>>>>>>> > > > > > > > hasn't
>>>>>>> > > > > > > > > > been a call for fundamental changes to the API for
>>>>>>> ergonomic or
>>>>>>> > > > > > > > > > > >> > >> usability concerns.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Scalability
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Google could get away without the Go SDK
>>>>>>> having an SDK
>>>>>>> > > > > side
>>>>>>> > > > > > > > > > scalability solution as a result of it's
>>>>>>> integration with Flume.
>>>>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns
>>>>>>> along with
>>>>>>> > > > > Dynamic
>>>>>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>>>>>> transforms
>>>>>>> > > > > natively
>>>>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>>>>> Transforms, with
>>>>>>> > > > > Beam
>>>>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>>>>> transforms
>>>>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who
>>>>>>> implemented the SDF
>>>>>>> > > > > side
>>>>>>> > > > > > > > work,
>>>>>>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper
>>>>>>> for the
>>>>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>>>>> Transforms, which
>>>>>>> > > > > is often
>>>>>>> > > > > > > > > > been requested. This will also enable use of the
>>>>>>> Beam SQL
>>>>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Features
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core.
>>>>>>> The Go SDK
>>>>>>> > > > > implements
>>>>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>>>>> CombineFns and access
>>>>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>>>>> GroupByKey, and
>>>>>>> > > > > features
>>>>>>> > > > > > > > like
>>>>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>>>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported
>>>>>>> for batch even
>>>>>>> > > > > > > > through
>>>>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>>>>>> versatile for
>>>>>>> > > > > batch
>>>>>>> > > > > > > > > > execution on portable runners, and for simple
>>>>>>> streaming
>>>>>>> > > > > pipelines.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Repo Testing
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's
>>>>>>> unit tests. On
>>>>>>> > > > > top of
>>>>>>> > > > > > > > > > that, it runs all it's integration tests against
>>>>>>> the Python
>>>>>>> > > > > Portable
>>>>>>> > > > > > > > runner,
>>>>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>>>>> breaking changes
>>>>>>> > > > > without
>>>>>>> > > > > > > > > > overspending community resources. Those same tests
>>>>>>> are also
>>>>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>>>>> runners via the
>>>>>>> > > > > > > > appropriate
>>>>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>>>>> management server),
>>>>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up
>>>>>>> runner
>>>>>>> > > > > instances for
>>>>>>> > > > > > > > > > you). Documentation for executing tests and adding
>>>>>>> new ones
>>>>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible
>>>>>>> to Go
>>>>>>> > > > > developers as
>>>>>>> > > > > > > > > > they're implemented with the standard Go testing
>>>>>>> tools.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Shortcomings
>>>>>>> > > > > > > > > > > >> > >> That said, there's still much to do. Let
>>>>>>> me briefly
>>>>>>> > > > > tell you
>>>>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>>>>> whether they block
>>>>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>>>>> implemented as
>>>>>>> > > > > Splittable
>>>>>>> > > > > > > > > > DoFn.
>>>>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it
>>>>>>> will serve as
>>>>>>> > > > > a the
>>>>>>> > > > > > > > > > first example for future contributions for
>>>>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at
>>>>>>> this point
>>>>>>> > > > > users are
>>>>>>> > > > > > > > > > empowered to write their own DoFns or wrap
>>>>>>> existing transforms
>>>>>>> > > > > for
>>>>>>> > > > > > > > Cross
>>>>>>> > > > > > > > > > Language use.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>>>>>>> features have
>>>>>>> > > > > yet to
>>>>>>> > > > > > > > be
>>>>>>> > > > > > > > > > implemented, but they're largely additions to what
>>>>>>> exists already
>>>>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the
>>>>>>> work is
>>>>>>> > > > > definining
>>>>>>> > > > > > > > how a
>>>>>>> > > > > > > > > > user specifies their desires, and turning those
>>>>>>> into the
>>>>>>> > > > > appropriate
>>>>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back
>>>>>>> in October I
>>>>>>> > > > > wrote at
>>>>>>> > > > > > > > > > length on the wiki [1] what's missing for
>>>>>>> additional streaming
>>>>>>> > > > > > > > features.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>>>>> recently, there's
>>>>>>> > > > > likely
>>>>>>> > > > > > > > > > still more we could test to improve our confidence
>>>>>>> in the SDK,
>>>>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>>>>> transforms
>>>>>>> > > > > libraries and
>>>>>>> > > > > > > > > > examples.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Moving Forward
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>>>>> incorporating the Go
>>>>>>> > > > > SDK
>>>>>>> > > > > > > > fully
>>>>>>> > > > > > > > > > into the Beam Programming Guide. I've audited the
>>>>>>> guide [3], and
>>>>>>> > > > > > > > > > > >> > >> am beginning to add missing content and
>>>>>>> filling in the
>>>>>>> > > > > Go
>>>>>>> > > > > > > > > > specific gaps. This will be tied to improving the
>>>>>>> Go Doc with
>>>>>>> > > > > more Go
>>>>>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>>>>>> appropriate for
>>>>>>> > > > > the
>>>>>>> > > > > > > > BPG.
>>>>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around
>>>>>>> the public
>>>>>>> > > > > display of
>>>>>>> > > > > > > > > > that GoDoc.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a
>>>>>>> binding vote, I will
>>>>>>> > > > > > > > > > incorporate the SDK into the release process, and
>>>>>>> remove the
>>>>>>> > > > > > > > "experimental"
>>>>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>>>>> entails updating
>>>>>>> > > > > the
>>>>>>> > > > > > > > > > release scripts to also build and publish the Go
>>>>>>> SDK Docker
>>>>>>> > > > > containers.
>>>>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>>>>> technically already
>>>>>>> > > > > doing so
>>>>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>>>>>>> however will be
>>>>>>> > > > > > > > > > migrating the SDK to use Go Modules for dependency
>>>>>>> version
>>>>>>> > > > > control,
>>>>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on
>>>>>>> after his Kafka
>>>>>>> > > > > task.
>>>>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>>>>> contributors, and
>>>>>>> > > > > users
>>>>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>>>>> dependency
>>>>>>> > > > > management.
>>>>>>> > > > > > > > It
>>>>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on
>>>>>>> the
>>>>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions
>>>>>>> you might have
>>>>>>> > > > > about
>>>>>>> > > > > > > > the
>>>>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>>>>> intentionally
>>>>>>> > > > > avoided
>>>>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they
>>>>>>> can distract
>>>>>>> > > > > from the
>>>>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we
>>>>>>> need to tell
>>>>>>> > > > > them that
>>>>>>> > > > > > > > they
>>>>>>> > > > > > > > > > can
>>>>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> Robert Burke
>>>>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>>>>> > > > > > > > > > > >> > >>
>>>>>>> > > > > > > > > > > >> > >> [0]
>>>>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>>>>> > > > > > > > > > > >> > >> [1]
>>>>>>> > > > > > > > > >
>>>>>>> > > > > > > >
>>>>>>> > > > >
>>>>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>>>>> > > > > > > > > > > >> > >> [2]
>>>>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>>>>> > > > > > > > > > > >> > >> [3]
>>>>>>> > > > > > > > > >
>>>>>>> > > > > > > >
>>>>>>> > > > >
>>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>>> > > > > > > > > > (SDK Audit sheet)
>>>>>>> > > > > > > > > > > >> > >> [4]
>>>>>>> > > > > > > > > >
>>>>>>> > > > > > > >
>>>>>>> > > > >
>>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>>>>> > > > > > > > > > > >> >
>>>>>>> > > > > > > > > >
>>>>>>> > > > > > > > >
>>>>>>> > > > > > > >
>>>>>>> > > > > > >
>>>>>>> > > > > >
>>>>>>> > > > >
>>>>>>> > > >
>>>>>>> > >
>>>>>>>
>>>>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Ahmet Altay <al...@google.com>.
Thank you Rebo, and congratulations to everyone working on Go SDK :)

@Robert Burke <re...@google.com> @Brittany Hermann <he...@google.com> @Sachin
Agarwal <sa...@google.com> - Should we share this on Beam's twitter and
other social media pages?

On Fri, Nov 5, 2021 at 1:29 PM Robert Burke <ro...@frantil.com> wrote:

> It's my great pleasure to announce that the Apache Beam Go SDK is no
> longer experimental. https://beam.apache.org/blog/go-sdk-release/
>
> Thank you everyone.
> Robert Burke
> Beam Go Busybody
>
> On Thu, Nov 4, 2021, 6:29 PM Robert Burke <ro...@frantil.com> wrote:
>
>> At this point I just need an LGTM on the blog post PR, as the draft is
>> finalized.
>>
>> Udi added the sdks/v2.33.0 tag which works as expected. I've also
>> verified that the appropriate container is used by default when not
>> specified which is the last unknown in this process.
>>
>> Who's ready to release a new SDK? I am!
>>
>>  https://github.com/apache/beam/pull/15894 (or join the exciting
>> reaction emoji on the top post).
>>
>>
>>
>> On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:
>>
>>> The current draft of the exit blog post is
>>> https://github.com/apache/beam/pull/15894
>>> Comments are very welcome. I'm going to continue looking for Known
>>> issues (which will be linked to their respective JIRAs) tomorrow.
>>>
>>> Since RC1 is getting cycled, I can also go back to the original plan of
>>> v2.33.0, if we'd like to get it out this week.
>>>
>>>
>>> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:
>>>
>>>> Investigation yielded that there's no way around the prefixed tags. The
>>>> JIRA has been commented with the explanation.
>>>>
>>>> https://github.com/apache/beam/pull/15881 has the release script
>>>> updates.
>>>>
>>>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The
>>>> draft PR will be linked here.
>>>>
>>>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
>>>> inclined to wait for that release to finish before publishing the blogpost.
>>>> I'll link the draft PR here as soon as it's ready.
>>>>
>>>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also
>>>> prefix tagged so there isn't a gap in versions between the unmoduled code
>>>> and moduled code.
>>>>
>>>> Once published,  that'll be the end of this thread.
>>>>
>>>> Thank you very much everyone.
>>>>
>>>> Robert Burke
>>>> Beam Go Busybody
>>>>
>>>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com> wrote:
>>>>
>>>>> +1 to extra tags. They'll be trivial to add to our release process,
>>>>> and git tags are lightweight by design so I don't foresee any problems.
>>>>>
>>>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
>>>>> wrote:
>>>>>
>>>>>> Glad you were able to figure it out. The extra tags are certainly
>>>>>> worth making this work if it's what we have to do, and shouldn't be
>>>>>> too much of a problem (until, hopefully, it's fixed on the go side).
>>>>>>
>>>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>>>> wrote:
>>>>>> >
>>>>>> > With Kyle's help with the additional tagging of the next RC, we
>>>>>> have validated that this is the currently correct approach.
>>>>>> >
>>>>>> >
>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>>>> >
>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>>>> >
>>>>>> > Or even:
>>>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>>>> (links to latest tagged version)
>>>>>> >
>>>>>> > The main cost to this approach is doubling the number of tags in
>>>>>> the tags list: https://github.com/apache/beam/tags which is not
>>>>>> ideal, but overall a small cost. There's no need for "full publish" of
>>>>>> these additional tags, so we won't be doubling our "releases" (see
>>>>>> https://github.com/apache/beam/releases).
>>>>>> >
>>>>>> > I'll still be filing a bug against the Go commands since the
>>>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>>>> so, we can always delete the tags from the affected branches, and cease the
>>>>>> behavior going forward. I'll search through the existing Go issues first
>>>>>> however to see if this has been previously discussed, and report my
>>>>>> findings here either way.
>>>>>> >
>>>>>> > This does require 2 small changes to release guide: The rc tagging
>>>>>> script, and the finally tagging:
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>>>> >
>>>>>> >
>>>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>>>> >
>>>>>> > I'll make this change later this week (or early next) assuming
>>>>>> there are no objections.
>>>>>> >
>>>>>> > Thank you all very much for your patience,
>>>>>> > Robert Burke
>>>>>> > Beam Go Busybody
>>>>>> >
>>>>>> >
>>>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>>>>>> > > With much research in reading the Go Modules documentation, I
>>>>>> have confirmed what the issue is.
>>>>>> > >
>>>>>> > > We added the go.mod file to sdks/ under the repo root because
>>>>>> it's a cleaner spot for the change, captures the Java and Python container
>>>>>> boot code (written in Go) into the module and avoids conflicts in
>>>>>> interpretations of the vendor directory that lives at the root level.
>>>>>> > >
>>>>>> > > However, we missed that when doing so, the standard version tags
>>>>>> would only apply to modules at the root level, not at modules in
>>>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>>>> quoting the important paragraph:
>>>>>> > >
>>>>>> > > > If a module is defined in a subdirectory within the repository,
>>>>>> that is, the module subdirectory portion of
>>>>>> > > > the module path is not empty, then each tag name must be
>>>>>> prefixed with the module subdirectory,
>>>>>> > > > followed by a slash. For example, the module
>>>>>> golang.org/x/tools/gopls is defined in the gopls
>>>>>> > > > subdirectory of the repository with root path
>>>>>> golang.org/x/tools. The version v0.4.0 of that module must > have
>>>>>> the tag named gopls/v0.4.0 in that repository.
>>>>>> > >
>>>>>> > > Specifically, for the Go SDK to be able to be fetched at the
>>>>>> right version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>>>> "sdks/v2.34.0-RC1"
>>>>>> > >
>>>>>> > > So, the fix for the Go versioning issue is to amend our Release
>>>>>> process (including generating Release Candidate builds) to also add a
>>>>>> prefixed version tag with the same version.
>>>>>> > >
>>>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if
>>>>>> there are no objections we can back update the 2.33.0 release branch with
>>>>>> such a prefixed tag. At which point I can also write the Official
>>>>>> Experiemental Exit Blog post.
>>>>>> > >
>>>>>> > > Thank you all for your patience.
>>>>>> > > Robert Burke
>>>>>> > >
>>>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>>>> > > > Thank you for the detailed update! Let us know if we can help.
>>>>>> > > >
>>>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <
>>>>>> lostluck@apache.org> wrote:
>>>>>> > > >
>>>>>> > > > > This is a status update.
>>>>>> > > > >
>>>>>> > > > > At this point 2.33.0 is released, but there are difficulties
>>>>>> with
>>>>>> > > > > accessing the tagged versions using the standard go tools.
>>>>>> It's currently
>>>>>> > > > > under investigation.
>>>>>> > > > >
>>>>>> > > > > Using the v2 path in a go program then running `go mod tidy`
>>>>>> will populate
>>>>>> > > > > the file with  a pseudo-version rather than the latest tag
>>>>>> (v2.33.0)  (eg
>>>>>> > > > > the line looks like
>>>>>> > > > > require github.com/apache/beam/sdks/v2
>>>>>> v2.0.0-20211013181004-a9120e083008
>>>>>> > > > > )
>>>>>> > > > >
>>>>>> > > > > While this will work, it's not the desired experience for
>>>>>> users at this
>>>>>> > > > > point. Current downside is that the releases are not
>>>>>> meaningful targets for
>>>>>> > > > > some reason. However, we retain the other benefits of Go
>>>>>> Modules (actual
>>>>>> > > > > dependency versioning, management by go tools).
>>>>>> > > > >
>>>>>> > > > > The issue is some combination of the go tooling [A] , that we
>>>>>> added a go
>>>>>> > > > > mod file outside of the repo root [B], and that we did not
>>>>>> increment the
>>>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>>>> > > > >
>>>>>> > > > > [B] From the go documentation, this should be legal and fine,
>>>>>> even if it's
>>>>>> > > > > not recommended. This is fortunate because the root of the
>>>>>> repo would have
>>>>>> > > > > played poorly with root vendor directory, which the go tools
>>>>>> have opinions
>>>>>> > > > > on.
>>>>>> > > > >
>>>>>> > > > > [C] Incrementing the major version is recommended,in the Go
>>>>>> Modules
>>>>>> > > > > documentation, when transitioning to Go Modules. However, it
>>>>>> never said it
>>>>>> > > > > was required, nor did it indicate this current failure mode.
>>>>>> If anything
>>>>>> > > > > this should be documented in those docs, if it's not another
>>>>>> bug. We would
>>>>>> > > > > not necessarily want to declare a global v3 for beam at this
>>>>>> time, for just
>>>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>>>> Notionally there are
>>>>>> > > > > some larger breaking changes the Java and Python SDKs would
>>>>>> want to make in
>>>>>> > > > > such an event, and thus it's a larger conversation, that is
>>>>>> out of scope at
>>>>>> > > > > this time.
>>>>>> > > > >
>>>>>> > > > > This leaves [A] where some mis-understanding of the
>>>>>> documented semantics
>>>>>> > > > > occurred. I certainly expected the tagged version of the
>>>>>> non-root go-module
>>>>>> > > > > to be inherited from the parent, not wholesale ignored. As a
>>>>>> result, I'll
>>>>>> > > > > be filing a bug against the go tools to determine this, and
>>>>>> see what paths
>>>>>> > > > > forward exist.
>>>>>> > > > >
>>>>>> > > > > It's my hope to resolve this before we write a properly
>>>>>> Experimental Exit
>>>>>> > > > > blog post for the Go SDK.
>>>>>> > > > >
>>>>>> > > > > Thank you for your patience, and time.
>>>>>> > > > > Robert Burke
>>>>>> > > > > Beam Go Busybody
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > >
>>>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>>>> wrote:
>>>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK
>>>>>> now uses Go
>>>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>>>> contributions. [2]
>>>>>> > > > > >
>>>>>> > > > > > The Module file lives in the sdks/ directory so there's a
>>>>>> single Go
>>>>>> > > > > Module for the whole SDK, tests, examples, and any support
>>>>>> code for the
>>>>>> > > > > container boot builds. This excludes the Go SDK Code katas
>>>>>> [3] go modules
>>>>>> > > > > which can be updated once 2.33.0 has been released.
>>>>>> > > > > >
>>>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>>>>> builds, and
>>>>>> > > > > default uses the release specific container for docker
>>>>>> execution jobs. For
>>>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>>>> validation will
>>>>>> > > > > need to explictly specify RC versions of containers. However,
>>>>>> given that
>>>>>> > > > > the Go SDK container and worker boot process rarely changes,
>>>>>> this is
>>>>>> > > > > unlikely to be an issue.
>>>>>> > > > > >
>>>>>> > > > > > At present I'm cleaning up some of the references to
>>>>>> experimental, and
>>>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>>>> release (even
>>>>>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md
>>>>>> will be
>>>>>> > > > > updated to note the event, but a larger blogpost will happen
>>>>>> after the
>>>>>> > > > > release goes public.
>>>>>> > > > > >
>>>>>> > > > > > Cheers,
>>>>>> > > > > > Robert Burke
>>>>>> > > > > > Defacto Beam Go TL.
>>>>>> > > > > >
>>>>>> > > > > > [1]
>>>>>> > > > >
>>>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>>>> > > > > > [3]
>>>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>>>> > > > > >
>>>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com>
>>>>>> wrote:
>>>>>> > > > > > > +1, congratulations & thank you!
>>>>>> > > > > > >
>>>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>>>> lostluck@apache.org>
>>>>>> > > > > wrote:
>>>>>> > > > > > >
>>>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes
>>>>>> up to section
>>>>>> > > > > ~4.3.
>>>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>>>> > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <
>>>>>> robert@frantil.com> wrote:
>>>>>> > > > > > > > > Yup!
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > My immediate plan is to work on incorporating the Go
>>>>>> SDK fully
>>>>>> > > > > into the
>>>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>>>>> > > > > > > > > am beginning to add missing content and filling in
>>>>>> the Go specific
>>>>>> > > > > gaps.
>>>>>> > > > > > > > > This will be tied to improving the Go Doc with more Go
>>>>>> > > > > > > > > specific user documentation that isn't appropriate
>>>>>> for the BPG.
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > My audit of the guide is here:
>>>>>> > > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > >
>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>>>> feature page
>>>>>> > > > > looks
>>>>>> > > > > > > > worse
>>>>>> > > > > > > > > than it is, as it was more productive to focus on
>>>>>> what isn't
>>>>>> > > > > available
>>>>>> > > > > > > > than
>>>>>> > > > > > > > > what is. That's a snapshot of my actual working sheet
>>>>>> but I'll be
>>>>>> > > > > > > > updating
>>>>>> > > > > > > > > it as needed.
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>>>> iemejia@gmail.com>
>>>>>> > > > > wrote:
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > > Oups forgot to write one question. Will this come
>>>>>> with revamped
>>>>>> > > > > > > > > > website instructions/doc for golang too?
>>>>>> > > > > > > > > >
>>>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>>>> iemejia@gmail.com>
>>>>>> > > > > > > > wrote:
>>>>>> > > > > > > > > > >
>>>>>> > > > > > > > > > > Huge +1
>>>>>> > > > > > > > > > >
>>>>>> > > > > > > > > > > This is definitely something many people have
>>>>>> asked about, so
>>>>>> > > > > it is
>>>>>> > > > > > > > > > > great to see it finally happening.
>>>>>> > > > > > > > > > >
>>>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
>>>>>> > > > > kenn@apache.org>
>>>>>> > > > > > > > wrote:
>>>>>> > > > > > > > > > > >
>>>>>> > > > > > > > > > > > +1 awesome
>>>>>> > > > > > > > > > > >
>>>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
>>>>>> > > > > lostluck@apache.org
>>>>>> > > > > > > > >
>>>>>> > > > > > > > > > wrote:
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to
>>>>>> get those (Go
>>>>>> > > > > > > > modules
>>>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>>>>> certainly
>>>>>> > > > > before the
>>>>>> > > > > > > > 2.33
>>>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>>>> process.
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>>>>> future, we may
>>>>>> > > > > want a
>>>>>> > > > > > > > > > harder break between a newer Generic first API and
>>>>>> and the
>>>>>> > > > > current
>>>>>> > > > > > > > version,
>>>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go
>>>>>> aren't
>>>>>> > > > > identical to
>>>>>> > > > > > > > the
>>>>>> > > > > > > > > > feature referred to by that term in Java, C++,
>>>>>> Rust, etc, so
>>>>>> > > > > it'll
>>>>>> > > > > > > > take a
>>>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> However, by the current nature of Go, we had
>>>>>> to have pretty
>>>>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns
>>>>>> and map them
>>>>>> > > > > to their
>>>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>>>> emitter, and
>>>>>> > > > > Iterator
>>>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
>>>>>> internals to
>>>>>> > > > > use
>>>>>> > > > > > > > > > generics (like the implementation of Stats DoFns
>>>>>> like Min, Max,
>>>>>> > > > > etc)
>>>>>> > > > > > > > would
>>>>>> > > > > > > > > > also be able to be made transparently to most
>>>>>> users, and
>>>>>> > > > > certainly any
>>>>>> > > > > > > > of
>>>>>> > > > > > > > > > the framework for execution time handling (the
>>>>>> "worker's SDK
>>>>>> > > > > harness")
>>>>>> > > > > > > > > > would be able to be cleaned up if need be. Finally,
>>>>>> adding more
>>>>>> > > > > > > > > > sophisticated DoFn registration and code generation
>>>>>> would be
>>>>>> > > > > able to
>>>>>> > > > > > > > > > replace the optional code generator entirely,
>>>>>> saving some users
>>>>>> > > > > a `go
>>>>>> > > > > > > > > > generate` step, simplifying getting improved
>>>>>> execution
>>>>>> > > > > performance.
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> Changing things like making a Type
>>>>>> Parameterized
>>>>>> > > > > PCollection,
>>>>>> > > > > > > > would
>>>>>> > > > > > > > > > be far more involved, as would trying to use some
>>>>>> kind of Apply
>>>>>> > > > > > > > format. The
>>>>>> > > > > > > > > > lack of Method Overrides prevents the apply
>>>>>> chaining approach.
>>>>>> > > > > Or at
>>>>>> > > > > > > > least
>>>>>> > > > > > > > > > prevents it from working simply.
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> Finally, Go Generics won't be available until
>>>>>> Go 1.18,
>>>>>> > > > > which isn't
>>>>>> > > > > > > > > > until next year. See
>>>>>> https://blog.golang.org/generics-proposal
>>>>>> > > > > for
>>>>>> > > > > > > > > > details.
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17
>>>>>> does include a
>>>>>> > > > > Register
>>>>>> > > > > > > > > > calling convention, leading to a modest performance
>>>>>> improvement
>>>>>> > > > > across
>>>>>> > > > > > > > the
>>>>>> > > > > > > > > > board.
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> Cheers,
>>>>>> > > > > > > > > > > >> Robert Burke
>>>>>> > > > > > > > > > > >>
>>>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>>>> > > > > robertwb@google.com>
>>>>>> > > > > > > > wrote:
>>>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>>>> experimental once
>>>>>> > > > > the Go
>>>>>> > > > > > > > > > Modules
>>>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK
>>>>>> needs to support
>>>>>> > > > > every
>>>>>> > > > > > > > > > feature
>>>>>> > > > > > > > > > > >> > to be accepted, especially now that we can do
>>>>>> > > > > cross-language
>>>>>> > > > > > > > > > > >> > transforms, and Go definitely supports
>>>>>> enough to be quite
>>>>>> > > > > > > > useful.
>>>>>> > > > > > > > > > (WRT
>>>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>>>> supports the
>>>>>> > > > > streaming
>>>>>> > > > > > > > model
>>>>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine
>>>>>> on a streaming
>>>>>> > > > > > > > runner,
>>>>>> > > > > > > > > > even
>>>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>>>> timers aren't yet
>>>>>> > > > > > > > > > available.)
>>>>>> > > > > > > > > > > >> >
>>>>>> > > > > > > > > > > >> > This is a great milestone.
>>>>>> > > > > > > > > > > >> >
>>>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>>>> Hamilton <
>>>>>> > > > > > > > tysonjh@google.com>
>>>>>> > > > > > > > > > wrote:
>>>>>> > > > > > > > > > > >> > >
>>>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>>>> > > > > > > > > > > >> > >
>>>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>>>> status after Go
>>>>>> > > > > Modules
>>>>>> > > > > > > > > > are completed and the LICENSE issue is resolved. I
>>>>>> don't think
>>>>>> > > > > that
>>>>>> > > > > > > > lacking
>>>>>> > > > > > > > > > streaming support is a blocker. The other thing I
>>>>>> checked to see
>>>>>> > > > > was if
>>>>>> > > > > > > > > > there were metrics available on
>>>>>> metrics.beam.apache.org,
>>>>>> > > > > specifically
>>>>>> > > > > > > > for
>>>>>> > > > > > > > > > measuring code health via post-commit over time,
>>>>>> which there are
>>>>>> > > > > and
>>>>>> > > > > > > > the
>>>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing
>>>>>> that
>>>>>> > > > > surprised me
>>>>>> > > > > > > > from
>>>>>> > > > > > > > > > your summary is that when Go introduces generics it
>>>>>> won't result
>>>>>> > > > > in any
>>>>>> > > > > > > > > > backwards incompatible changes in Apache Beam.
>>>>>> That's great
>>>>>> > > > > news, but
>>>>>> > > > > > > > does
>>>>>> > > > > > > > > > it mean there will be a need to support both
>>>>>> non-generic and
>>>>>> > > > > generic
>>>>>> > > > > > > > APIs
>>>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>>>> introduced in the
>>>>>> > > > > Go
>>>>>> > > > > > > > 1.17
>>>>>> > > > > > > > > > release (optimistically) in August this year.
>>>>>> > > > > > > > > > > >> > >
>>>>>> > > > > > > > > > > >> > >
>>>>>> > > > > > > > > > > >> > >
>>>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert
>>>>>> Burke <
>>>>>> > > > > > > > lostluck@apache.org>
>>>>>> > > > > > > > > > wrote:
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam
>>>>>> Go SDK
>>>>>> > > > > > > > experimental.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>>>> community, and any
>>>>>> > > > > > > > conditions
>>>>>> > > > > > > > > > that remain that would prevent the exit.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> tl;dr;
>>>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I
>>>>>> have both.
>>>>>> > > > > > > > > > > >> > >> This entails including it officially in
>>>>>> the Release
>>>>>> > > > > process,
>>>>>> > > > > > > > > > removing the various "experimental" text throughout
>>>>>> the repo etc,
>>>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python and
>>>>>> Java. Some Go
>>>>>> > > > > > > > specific
>>>>>> > > > > > > > > > tasks around dep versioning.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>>>> efficiently for
>>>>>> > > > > most
>>>>>> > > > > > > > batch
>>>>>> > > > > > > > > > tasks, including basic windowing.
>>>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
>>>>>> tested on all
>>>>>> > > > > > > > Portable
>>>>>> > > > > > > > > > runners.
>>>>>> > > > > > > > > > > >> > >> The core APIs are not going to change in
>>>>>> incompatible
>>>>>> > > > > ways
>>>>>> > > > > > > > going
>>>>>> > > > > > > > > > forward.
>>>>>> > > > > > > > > > > >> > >> Scalable transforms can be written through
>>>>>> > > > > SplittableDoFns or
>>>>>> > > > > > > > > > via Cross Language transforms.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
>>>>>> keeping it
>>>>>> > > > > > > > experimental
>>>>>> > > > > > > > > > doesn't help with that any further.
>>>>>> > > > > > > > > > > >> > >> Communities grow through contributions
>>>>>> and use, and
>>>>>> > > > > > > > experimental
>>>>>> > > > > > > > > > markers dissuade users.
>>>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand what
>>>>>> can be done
>>>>>> > > > > with
>>>>>> > > > > > > > the
>>>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>>>> Experimental, it's
>>>>>> > > > > > > > because
>>>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>>>> significantly.
>>>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work
>>>>>> for users of
>>>>>> > > > > the SDK
>>>>>> > > > > > > > on
>>>>>> > > > > > > > > > every release which leads to sticking to older
>>>>>> versions or
>>>>>> > > > > forking
>>>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>>>> should be looked
>>>>>> > > > > > > > forward
>>>>>> > > > > > > > > > to, and viewed as having little risk. Further while
>>>>>> there's been
>>>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low
>>>>>> bar" is for a
>>>>>> > > > > new
>>>>>> > > > > > > > SDK, it
>>>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel
>>>>>> this has
>>>>>> > > > > > > > > > > >> > >> hurt development and contribution of new
>>>>>> SDK languages
>>>>>> > > > > > > > (inherent
>>>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>>>> entirely clear
>>>>>> > > > > what the
>>>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>>>> language like Go.
>>>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0])
>>>>>> goes into
>>>>>> > > > > detail
>>>>>> > > > > > > > what it
>>>>>> > > > > > > > > > means for a language without
>>>>>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance
>>>>>> to implement
>>>>>> > > > > the
>>>>>> > > > > > > > beam
>>>>>> > > > > > > > > > model. One could largely throw away static types
>>>>>> (like Python),
>>>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It
>>>>>> would not do
>>>>>> > > > > if the
>>>>>> > > > > > > > > > approach couldn't grow and scale to the Beam Model.
>>>>>> It's also
>>>>>> > > > > hard
>>>>>> > > > > > > > > > > >> > >> to tell if an API is any good before
>>>>>> there are users.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Further, in the early days of
>>>>>> Portability, there
>>>>>> > > > > wasn't a
>>>>>> > > > > > > > way to
>>>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise.
>>>>>> It's an
>>>>>> > > > > incredible
>>>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial
>>>>>> fanout of work on
>>>>>> > > > > a
>>>>>> > > > > > > > single
>>>>>> > > > > > > > > > machine, write everything to a Reshuffle, just in
>>>>>> order to scale
>>>>>> > > > > up.
>>>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is
>>>>>> little more than
>>>>>> > > > > > > > overhead.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> At this point, both of these needs are
>>>>>> met within the
>>>>>> > > > > Go SDK
>>>>>> > > > > > > > for
>>>>>> > > > > > > > > > open source.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Background
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam
>>>>>> repo for a few
>>>>>> > > > > years
>>>>>> > > > > > > > now,
>>>>>> > > > > > > > > > since it was accidentally merged into master.
>>>>>> > > > > > > > > > > >> > >> Since then it's been called experimental,
>>>>>> and not
>>>>>> > > > > officially
>>>>>> > > > > > > > > > part of the releases.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>>>> around Beam
>>>>>> > > > > Portability
>>>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>>>> specific )
>>>>>> > > > > workers.
>>>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos
>>>>>> and FnAPI to
>>>>>> > > > > > > > execute
>>>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>>>> Dataflow, but now
>>>>>> > > > > > > > > > > >> > >> on all portable supported runners, like
>>>>>> Flink, Spark,
>>>>>> > > > > the
>>>>>> > > > > > > > Python
>>>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> API Stability
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed
>>>>>> it's user API
>>>>>> > > > > for DoFn
>>>>>> > > > > > > > > > and pipeline construction since it was first merged
>>>>>> in, and
>>>>>> > > > > there are
>>>>>> > > > > > > > no
>>>>>> > > > > > > > > > > >> > >> changes to that on the horizon that can't
>>>>>> be made in a
>>>>>> > > > > > > > backwards
>>>>>> > > > > > > > > > compatible manner. Largely these are related to New
>>>>>> Features, or
>>>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>>>> advent of Go
>>>>>> > > > > Generics
>>>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>>>>>> largely been
>>>>>> > > > > under
>>>>>> > > > > > > > work
>>>>>> > > > > > > > > > for use within Google. It's use is called FlumeGo,
>>>>>> representing
>>>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of
>>>>>> Flume,
>>>>>> > > > > Google's
>>>>>> > > > > > > > batch
>>>>>> > > > > > > > > > pipeline processing engine. Thus most of the focus
>>>>>> on improving
>>>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>>>>>> today, and
>>>>>> > > > > there
>>>>>> > > > > > > > hasn't
>>>>>> > > > > > > > > > been a call for fundamental changes to the API for
>>>>>> ergonomic or
>>>>>> > > > > > > > > > > >> > >> usability concerns.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Scalability
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Google could get away without the Go SDK
>>>>>> having an SDK
>>>>>> > > > > side
>>>>>> > > > > > > > > > scalability solution as a result of it's
>>>>>> integration with Flume.
>>>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns
>>>>>> along with
>>>>>> > > > > Dynamic
>>>>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>>>>> transforms
>>>>>> > > > > natively
>>>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>>>> Transforms, with
>>>>>> > > > > Beam
>>>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>>>> transforms
>>>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who
>>>>>> implemented the SDF
>>>>>> > > > > side
>>>>>> > > > > > > > work,
>>>>>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper
>>>>>> for the
>>>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>>>> Transforms, which
>>>>>> > > > > is often
>>>>>> > > > > > > > > > been requested. This will also enable use of the
>>>>>> Beam SQL
>>>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Features
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core.
>>>>>> The Go SDK
>>>>>> > > > > implements
>>>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>>>> CombineFns and access
>>>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>>>> GroupByKey, and
>>>>>> > > > > features
>>>>>> > > > > > > > like
>>>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported
>>>>>> for batch even
>>>>>> > > > > > > > through
>>>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>>>>> versatile for
>>>>>> > > > > batch
>>>>>> > > > > > > > > > execution on portable runners, and for simple
>>>>>> streaming
>>>>>> > > > > pipelines.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Repo Testing
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's
>>>>>> unit tests. On
>>>>>> > > > > top of
>>>>>> > > > > > > > > > that, it runs all it's integration tests against
>>>>>> the Python
>>>>>> > > > > Portable
>>>>>> > > > > > > > runner,
>>>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>>>> breaking changes
>>>>>> > > > > without
>>>>>> > > > > > > > > > overspending community resources. Those same tests
>>>>>> are also
>>>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>>>> runners via the
>>>>>> > > > > > > > appropriate
>>>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>>>> management server),
>>>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up
>>>>>> runner
>>>>>> > > > > instances for
>>>>>> > > > > > > > > > you). Documentation for executing tests and adding
>>>>>> new ones
>>>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible
>>>>>> to Go
>>>>>> > > > > developers as
>>>>>> > > > > > > > > > they're implemented with the standard Go testing
>>>>>> tools.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Shortcomings
>>>>>> > > > > > > > > > > >> > >> That said, there's still much to do. Let
>>>>>> me briefly
>>>>>> > > > > tell you
>>>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>>>> whether they block
>>>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>>>> implemented as
>>>>>> > > > > Splittable
>>>>>> > > > > > > > > > DoFn.
>>>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it
>>>>>> will serve as
>>>>>> > > > > a the
>>>>>> > > > > > > > > > first example for future contributions for
>>>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at
>>>>>> this point
>>>>>> > > > > users are
>>>>>> > > > > > > > > > empowered to write their own DoFns or wrap existing
>>>>>> transforms
>>>>>> > > > > for
>>>>>> > > > > > > > Cross
>>>>>> > > > > > > > > > Language use.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>>>>>> features have
>>>>>> > > > > yet to
>>>>>> > > > > > > > be
>>>>>> > > > > > > > > > implemented, but they're largely additions to what
>>>>>> exists already
>>>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the
>>>>>> work is
>>>>>> > > > > definining
>>>>>> > > > > > > > how a
>>>>>> > > > > > > > > > user specifies their desires, and turning those
>>>>>> into the
>>>>>> > > > > appropriate
>>>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in
>>>>>> October I
>>>>>> > > > > wrote at
>>>>>> > > > > > > > > > length on the wiki [1] what's missing for
>>>>>> additional streaming
>>>>>> > > > > > > > features.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>>>> recently, there's
>>>>>> > > > > likely
>>>>>> > > > > > > > > > still more we could test to improve our confidence
>>>>>> in the SDK,
>>>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>>>> transforms
>>>>>> > > > > libraries and
>>>>>> > > > > > > > > > examples.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Moving Forward
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>>>> incorporating the Go
>>>>>> > > > > SDK
>>>>>> > > > > > > > fully
>>>>>> > > > > > > > > > into the Beam Programming Guide. I've audited the
>>>>>> guide [3], and
>>>>>> > > > > > > > > > > >> > >> am beginning to add missing content and
>>>>>> filling in the
>>>>>> > > > > Go
>>>>>> > > > > > > > > > specific gaps. This will be tied to improving the
>>>>>> Go Doc with
>>>>>> > > > > more Go
>>>>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>>>>> appropriate for
>>>>>> > > > > the
>>>>>> > > > > > > > BPG.
>>>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around
>>>>>> the public
>>>>>> > > > > display of
>>>>>> > > > > > > > > > that GoDoc.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding
>>>>>> vote, I will
>>>>>> > > > > > > > > > incorporate the SDK into the release process, and
>>>>>> remove the
>>>>>> > > > > > > > "experimental"
>>>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>>>> entails updating
>>>>>> > > > > the
>>>>>> > > > > > > > > > release scripts to also build and publish the Go
>>>>>> SDK Docker
>>>>>> > > > > containers.
>>>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>>>> technically already
>>>>>> > > > > doing so
>>>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>>>>>> however will be
>>>>>> > > > > > > > > > migrating the SDK to use Go Modules for dependency
>>>>>> version
>>>>>> > > > > control,
>>>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on
>>>>>> after his Kafka
>>>>>> > > > > task.
>>>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>>>> contributors, and
>>>>>> > > > > users
>>>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>>>> dependency
>>>>>> > > > > management.
>>>>>> > > > > > > > It
>>>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on the
>>>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions
>>>>>> you might have
>>>>>> > > > > about
>>>>>> > > > > > > > the
>>>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>>>> intentionally
>>>>>> > > > > avoided
>>>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they can
>>>>>> distract
>>>>>> > > > > from the
>>>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we
>>>>>> need to tell
>>>>>> > > > > them that
>>>>>> > > > > > > > they
>>>>>> > > > > > > > > > can
>>>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> Robert Burke
>>>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>>>> > > > > > > > > > > >> > >>
>>>>>> > > > > > > > > > > >> > >> [0]
>>>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>>>> > > > > > > > > > > >> > >> [1]
>>>>>> > > > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > >
>>>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>>>> > > > > > > > > > > >> > >> [2]
>>>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>>>> > > > > > > > > > > >> > >> [3]
>>>>>> > > > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > >
>>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>>> > > > > > > > > > (SDK Audit sheet)
>>>>>> > > > > > > > > > > >> > >> [4]
>>>>>> > > > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > >
>>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>>>> > > > > > > > > > > >> >
>>>>>> > > > > > > > > >
>>>>>> > > > > > > > >
>>>>>> > > > > > > >
>>>>>> > > > > > >
>>>>>> > > > > >
>>>>>> > > > >
>>>>>> > > >
>>>>>> > >
>>>>>>
>>>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <ro...@frantil.com>.
It's my great pleasure to announce that the Apache Beam Go SDK is no longer
experimental. https://beam.apache.org/blog/go-sdk-release/

Thank you everyone.
Robert Burke
Beam Go Busybody

On Thu, Nov 4, 2021, 6:29 PM Robert Burke <ro...@frantil.com> wrote:

> At this point I just need an LGTM on the blog post PR, as the draft is
> finalized.
>
> Udi added the sdks/v2.33.0 tag which works as expected. I've also verified
> that the appropriate container is used by default when not specified which
> is the last unknown in this process.
>
> Who's ready to release a new SDK? I am!
>
>  https://github.com/apache/beam/pull/15894 (or join the exciting reaction
> emoji on the top post).
>
>
>
> On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:
>
>> The current draft of the exit blog post is
>> https://github.com/apache/beam/pull/15894
>> Comments are very welcome. I'm going to continue looking for Known issues
>> (which will be linked to their respective JIRAs) tomorrow.
>>
>> Since RC1 is getting cycled, I can also go back to the original plan of
>> v2.33.0, if we'd like to get it out this week.
>>
>>
>> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:
>>
>>> Investigation yielded that there's no way around the prefixed tags. The
>>> JIRA has been commented with the explanation.
>>>
>>> https://github.com/apache/beam/pull/15881 has the release script
>>> updates.
>>>
>>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The
>>> draft PR will be linked here.
>>>
>>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
>>> inclined to wait for that release to finish before publishing the blogpost.
>>> I'll link the draft PR here as soon as it's ready.
>>>
>>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also
>>> prefix tagged so there isn't a gap in versions between the unmoduled code
>>> and moduled code.
>>>
>>> Once published,  that'll be the end of this thread.
>>>
>>> Thank you very much everyone.
>>>
>>> Robert Burke
>>> Beam Go Busybody
>>>
>>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com> wrote:
>>>
>>>> +1 to extra tags. They'll be trivial to add to our release process, and
>>>> git tags are lightweight by design so I don't foresee any problems.
>>>>
>>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
>>>> wrote:
>>>>
>>>>> Glad you were able to figure it out. The extra tags are certainly
>>>>> worth making this work if it's what we have to do, and shouldn't be
>>>>> too much of a problem (until, hopefully, it's fixed on the go side).
>>>>>
>>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>>> wrote:
>>>>> >
>>>>> > With Kyle's help with the additional tagging of the next RC, we have
>>>>> validated that this is the currently correct approach.
>>>>> >
>>>>> >
>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>>> >
>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>>> >
>>>>> > Or even:
>>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>>> (links to latest tagged version)
>>>>> >
>>>>> > The main cost to this approach is doubling the number of tags in the
>>>>> tags list: https://github.com/apache/beam/tags which is not ideal,
>>>>> but overall a small cost. There's no need for "full publish" of these
>>>>> additional tags, so we won't be doubling our "releases" (see
>>>>> https://github.com/apache/beam/releases).
>>>>> >
>>>>> > I'll still be filing a bug against the Go commands since the
>>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>>> so, we can always delete the tags from the affected branches, and cease the
>>>>> behavior going forward. I'll search through the existing Go issues first
>>>>> however to see if this has been previously discussed, and report my
>>>>> findings here either way.
>>>>> >
>>>>> > This does require 2 small changes to release guide: The rc tagging
>>>>> script, and the finally tagging:
>>>>> >
>>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>>> >
>>>>> >
>>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>>> >
>>>>> > I'll make this change later this week (or early next) assuming there
>>>>> are no objections.
>>>>> >
>>>>> > Thank you all very much for your patience,
>>>>> > Robert Burke
>>>>> > Beam Go Busybody
>>>>> >
>>>>> >
>>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>>>>> > > With much research in reading the Go Modules documentation, I have
>>>>> confirmed what the issue is.
>>>>> > >
>>>>> > > We added the go.mod file to sdks/ under the repo root because it's
>>>>> a cleaner spot for the change, captures the Java and Python container boot
>>>>> code (written in Go) into the module and avoids conflicts in
>>>>> interpretations of the vendor directory that lives at the root level.
>>>>> > >
>>>>> > > However, we missed that when doing so, the standard version tags
>>>>> would only apply to modules at the root level, not at modules in
>>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>>> quoting the important paragraph:
>>>>> > >
>>>>> > > > If a module is defined in a subdirectory within the repository,
>>>>> that is, the module subdirectory portion of
>>>>> > > > the module path is not empty, then each tag name must be
>>>>> prefixed with the module subdirectory,
>>>>> > > > followed by a slash. For example, the module
>>>>> golang.org/x/tools/gopls is defined in the gopls
>>>>> > > > subdirectory of the repository with root path golang.org/x/tools.
>>>>> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in
>>>>> that repository.
>>>>> > >
>>>>> > > Specifically, for the Go SDK to be able to be fetched at the right
>>>>> version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>>> "sdks/v2.34.0-RC1"
>>>>> > >
>>>>> > > So, the fix for the Go versioning issue is to amend our Release
>>>>> process (including generating Release Candidate builds) to also add a
>>>>> prefixed version tag with the same version.
>>>>> > >
>>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there
>>>>> are no objections we can back update the 2.33.0 release branch with such a
>>>>> prefixed tag. At which point I can also write the Official Experiemental
>>>>> Exit Blog post.
>>>>> > >
>>>>> > > Thank you all for your patience.
>>>>> > > Robert Burke
>>>>> > >
>>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>>> > > > Thank you for the detailed update! Let us know if we can help.
>>>>> > > >
>>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <
>>>>> lostluck@apache.org> wrote:
>>>>> > > >
>>>>> > > > > This is a status update.
>>>>> > > > >
>>>>> > > > > At this point 2.33.0 is released, but there are difficulties
>>>>> with
>>>>> > > > > accessing the tagged versions using the standard go tools.
>>>>> It's currently
>>>>> > > > > under investigation.
>>>>> > > > >
>>>>> > > > > Using the v2 path in a go program then running `go mod tidy`
>>>>> will populate
>>>>> > > > > the file with  a pseudo-version rather than the latest tag
>>>>> (v2.33.0)  (eg
>>>>> > > > > the line looks like
>>>>> > > > > require github.com/apache/beam/sdks/v2
>>>>> v2.0.0-20211013181004-a9120e083008
>>>>> > > > > )
>>>>> > > > >
>>>>> > > > > While this will work, it's not the desired experience for
>>>>> users at this
>>>>> > > > > point. Current downside is that the releases are not
>>>>> meaningful targets for
>>>>> > > > > some reason. However, we retain the other benefits of Go
>>>>> Modules (actual
>>>>> > > > > dependency versioning, management by go tools).
>>>>> > > > >
>>>>> > > > > The issue is some combination of the go tooling [A] , that we
>>>>> added a go
>>>>> > > > > mod file outside of the repo root [B], and that we did not
>>>>> increment the
>>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>>> > > > >
>>>>> > > > > [B] From the go documentation, this should be legal and fine,
>>>>> even if it's
>>>>> > > > > not recommended. This is fortunate because the root of the
>>>>> repo would have
>>>>> > > > > played poorly with root vendor directory, which the go tools
>>>>> have opinions
>>>>> > > > > on.
>>>>> > > > >
>>>>> > > > > [C] Incrementing the major version is recommended,in the Go
>>>>> Modules
>>>>> > > > > documentation, when transitioning to Go Modules. However, it
>>>>> never said it
>>>>> > > > > was required, nor did it indicate this current failure mode.
>>>>> If anything
>>>>> > > > > this should be documented in those docs, if it's not another
>>>>> bug. We would
>>>>> > > > > not necessarily want to declare a global v3 for beam at this
>>>>> time, for just
>>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>>> Notionally there are
>>>>> > > > > some larger breaking changes the Java and Python SDKs would
>>>>> want to make in
>>>>> > > > > such an event, and thus it's a larger conversation, that is
>>>>> out of scope at
>>>>> > > > > this time.
>>>>> > > > >
>>>>> > > > > This leaves [A] where some mis-understanding of the documented
>>>>> semantics
>>>>> > > > > occurred. I certainly expected the tagged version of the
>>>>> non-root go-module
>>>>> > > > > to be inherited from the parent, not wholesale ignored. As a
>>>>> result, I'll
>>>>> > > > > be filing a bug against the go tools to determine this, and
>>>>> see what paths
>>>>> > > > > forward exist.
>>>>> > > > >
>>>>> > > > > It's my hope to resolve this before we write a properly
>>>>> Experimental Exit
>>>>> > > > > blog post for the Go SDK.
>>>>> > > > >
>>>>> > > > > Thank you for your patience, and time.
>>>>> > > > > Robert Burke
>>>>> > > > > Beam Go Busybody
>>>>> > > > >
>>>>> > > > >
>>>>> > > > >
>>>>> > > > >
>>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>>> wrote:
>>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK
>>>>> now uses Go
>>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>>> contributions. [2]
>>>>> > > > > >
>>>>> > > > > > The Module file lives in the sdks/ directory so there's a
>>>>> single Go
>>>>> > > > > Module for the whole SDK, tests, examples, and any support
>>>>> code for the
>>>>> > > > > container boot builds. This excludes the Go SDK Code katas [3]
>>>>> go modules
>>>>> > > > > which can be updated once 2.33.0 has been released.
>>>>> > > > > >
>>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>>>> builds, and
>>>>> > > > > default uses the release specific container for docker
>>>>> execution jobs. For
>>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>>> validation will
>>>>> > > > > need to explictly specify RC versions of containers. However,
>>>>> given that
>>>>> > > > > the Go SDK container and worker boot process rarely changes,
>>>>> this is
>>>>> > > > > unlikely to be an issue.
>>>>> > > > > >
>>>>> > > > > > At present I'm cleaning up some of the references to
>>>>> experimental, and
>>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>>> release (even
>>>>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md
>>>>> will be
>>>>> > > > > updated to note the event, but a larger blogpost will happen
>>>>> after the
>>>>> > > > > release goes public.
>>>>> > > > > >
>>>>> > > > > > Cheers,
>>>>> > > > > > Robert Burke
>>>>> > > > > > Defacto Beam Go TL.
>>>>> > > > > >
>>>>> > > > > > [1]
>>>>> > > > >
>>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>>> > > > > > [3]
>>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>>> > > > > >
>>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com>
>>>>> wrote:
>>>>> > > > > > > +1, congratulations & thank you!
>>>>> > > > > > >
>>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>>> lostluck@apache.org>
>>>>> > > > > wrote:
>>>>> > > > > > >
>>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up
>>>>> to section
>>>>> > > > > ~4.3.
>>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>>> > > > > > > >
>>>>> > > > > > > >
>>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com>
>>>>> wrote:
>>>>> > > > > > > > > Yup!
>>>>> > > > > > > > >
>>>>> > > > > > > > > My immediate plan is to work on incorporating the Go
>>>>> SDK fully
>>>>> > > > > into the
>>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>>>> > > > > > > > > am beginning to add missing content and filling in the
>>>>> Go specific
>>>>> > > > > gaps.
>>>>> > > > > > > > > This will be tied to improving the Go Doc with more Go
>>>>> > > > > > > > > specific user documentation that isn't appropriate for
>>>>> the BPG.
>>>>> > > > > > > > >
>>>>> > > > > > > > > My audit of the guide is here:
>>>>> > > > > > > > >
>>>>> > > > > > > >
>>>>> > > > >
>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>> > > > > > > > >
>>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>>> feature page
>>>>> > > > > looks
>>>>> > > > > > > > worse
>>>>> > > > > > > > > than it is, as it was more productive to focus on what
>>>>> isn't
>>>>> > > > > available
>>>>> > > > > > > > than
>>>>> > > > > > > > > what is. That's a snapshot of my actual working sheet
>>>>> but I'll be
>>>>> > > > > > > > updating
>>>>> > > > > > > > > it as needed.
>>>>> > > > > > > > >
>>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>>> iemejia@gmail.com>
>>>>> > > > > wrote:
>>>>> > > > > > > > >
>>>>> > > > > > > > > > Oups forgot to write one question. Will this come
>>>>> with revamped
>>>>> > > > > > > > > > website instructions/doc for golang too?
>>>>> > > > > > > > > >
>>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>>> iemejia@gmail.com>
>>>>> > > > > > > > wrote:
>>>>> > > > > > > > > > >
>>>>> > > > > > > > > > > Huge +1
>>>>> > > > > > > > > > >
>>>>> > > > > > > > > > > This is definitely something many people have
>>>>> asked about, so
>>>>> > > > > it is
>>>>> > > > > > > > > > > great to see it finally happening.
>>>>> > > > > > > > > > >
>>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
>>>>> > > > > kenn@apache.org>
>>>>> > > > > > > > wrote:
>>>>> > > > > > > > > > > >
>>>>> > > > > > > > > > > > +1 awesome
>>>>> > > > > > > > > > > >
>>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
>>>>> > > > > lostluck@apache.org
>>>>> > > > > > > > >
>>>>> > > > > > > > > > wrote:
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to
>>>>> get those (Go
>>>>> > > > > > > > modules
>>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>>>> certainly
>>>>> > > > > before the
>>>>> > > > > > > > 2.33
>>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>>> process.
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>>>> future, we may
>>>>> > > > > want a
>>>>> > > > > > > > > > harder break between a newer Generic first API and
>>>>> and the
>>>>> > > > > current
>>>>> > > > > > > > version,
>>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go
>>>>> aren't
>>>>> > > > > identical to
>>>>> > > > > > > > the
>>>>> > > > > > > > > > feature referred to by that term in Java, C++, Rust,
>>>>> etc, so
>>>>> > > > > it'll
>>>>> > > > > > > > take a
>>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> However, by the current nature of Go, we had to
>>>>> have pretty
>>>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns
>>>>> and map them
>>>>> > > > > to their
>>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>>> emitter, and
>>>>> > > > > Iterator
>>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
>>>>> internals to
>>>>> > > > > use
>>>>> > > > > > > > > > generics (like the implementation of Stats DoFns
>>>>> like Min, Max,
>>>>> > > > > etc)
>>>>> > > > > > > > would
>>>>> > > > > > > > > > also be able to be made transparently to most users,
>>>>> and
>>>>> > > > > certainly any
>>>>> > > > > > > > of
>>>>> > > > > > > > > > the framework for execution time handling (the
>>>>> "worker's SDK
>>>>> > > > > harness")
>>>>> > > > > > > > > > would be able to be cleaned up if need be. Finally,
>>>>> adding more
>>>>> > > > > > > > > > sophisticated DoFn registration and code generation
>>>>> would be
>>>>> > > > > able to
>>>>> > > > > > > > > > replace the optional code generator entirely, saving
>>>>> some users
>>>>> > > > > a `go
>>>>> > > > > > > > > > generate` step, simplifying getting improved
>>>>> execution
>>>>> > > > > performance.
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> Changing things like making a Type Parameterized
>>>>> > > > > PCollection,
>>>>> > > > > > > > would
>>>>> > > > > > > > > > be far more involved, as would trying to use some
>>>>> kind of Apply
>>>>> > > > > > > > format. The
>>>>> > > > > > > > > > lack of Method Overrides prevents the apply chaining
>>>>> approach.
>>>>> > > > > Or at
>>>>> > > > > > > > least
>>>>> > > > > > > > > > prevents it from working simply.
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> Finally, Go Generics won't be available until
>>>>> Go 1.18,
>>>>> > > > > which isn't
>>>>> > > > > > > > > > until next year. See
>>>>> https://blog.golang.org/generics-proposal
>>>>> > > > > for
>>>>> > > > > > > > > > details.
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does
>>>>> include a
>>>>> > > > > Register
>>>>> > > > > > > > > > calling convention, leading to a modest performance
>>>>> improvement
>>>>> > > > > across
>>>>> > > > > > > > the
>>>>> > > > > > > > > > board.
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> Cheers,
>>>>> > > > > > > > > > > >> Robert Burke
>>>>> > > > > > > > > > > >>
>>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>>> > > > > robertwb@google.com>
>>>>> > > > > > > > wrote:
>>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>>> experimental once
>>>>> > > > > the Go
>>>>> > > > > > > > > > Modules
>>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs
>>>>> to support
>>>>> > > > > every
>>>>> > > > > > > > > > feature
>>>>> > > > > > > > > > > >> > to be accepted, especially now that we can do
>>>>> > > > > cross-language
>>>>> > > > > > > > > > > >> > transforms, and Go definitely supports enough
>>>>> to be quite
>>>>> > > > > > > > useful.
>>>>> > > > > > > > > > (WRT
>>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>>> supports the
>>>>> > > > > streaming
>>>>> > > > > > > > model
>>>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine on
>>>>> a streaming
>>>>> > > > > > > > runner,
>>>>> > > > > > > > > > even
>>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>>> timers aren't yet
>>>>> > > > > > > > > > available.)
>>>>> > > > > > > > > > > >> >
>>>>> > > > > > > > > > > >> > This is a great milestone.
>>>>> > > > > > > > > > > >> >
>>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>>> Hamilton <
>>>>> > > > > > > > tysonjh@google.com>
>>>>> > > > > > > > > > wrote:
>>>>> > > > > > > > > > > >> > >
>>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>>> > > > > > > > > > > >> > >
>>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>>> status after Go
>>>>> > > > > Modules
>>>>> > > > > > > > > > are completed and the LICENSE issue is resolved. I
>>>>> don't think
>>>>> > > > > that
>>>>> > > > > > > > lacking
>>>>> > > > > > > > > > streaming support is a blocker. The other thing I
>>>>> checked to see
>>>>> > > > > was if
>>>>> > > > > > > > > > there were metrics available on
>>>>> metrics.beam.apache.org,
>>>>> > > > > specifically
>>>>> > > > > > > > for
>>>>> > > > > > > > > > measuring code health via post-commit over time,
>>>>> which there are
>>>>> > > > > and
>>>>> > > > > > > > the
>>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing
>>>>> that
>>>>> > > > > surprised me
>>>>> > > > > > > > from
>>>>> > > > > > > > > > your summary is that when Go introduces generics it
>>>>> won't result
>>>>> > > > > in any
>>>>> > > > > > > > > > backwards incompatible changes in Apache Beam.
>>>>> That's great
>>>>> > > > > news, but
>>>>> > > > > > > > does
>>>>> > > > > > > > > > it mean there will be a need to support both
>>>>> non-generic and
>>>>> > > > > generic
>>>>> > > > > > > > APIs
>>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>>> introduced in the
>>>>> > > > > Go
>>>>> > > > > > > > 1.17
>>>>> > > > > > > > > > release (optimistically) in August this year.
>>>>> > > > > > > > > > > >> > >
>>>>> > > > > > > > > > > >> > >
>>>>> > > > > > > > > > > >> > >
>>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert
>>>>> Burke <
>>>>> > > > > > > > lostluck@apache.org>
>>>>> > > > > > > > > > wrote:
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam
>>>>> Go SDK
>>>>> > > > > > > > experimental.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>>> community, and any
>>>>> > > > > > > > conditions
>>>>> > > > > > > > > > that remain that would prevent the exit.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> tl;dr;
>>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I
>>>>> have both.
>>>>> > > > > > > > > > > >> > >> This entails including it officially in
>>>>> the Release
>>>>> > > > > process,
>>>>> > > > > > > > > > removing the various "experimental" text throughout
>>>>> the repo etc,
>>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python and
>>>>> Java. Some Go
>>>>> > > > > > > > specific
>>>>> > > > > > > > > > tasks around dep versioning.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>>> efficiently for
>>>>> > > > > most
>>>>> > > > > > > > batch
>>>>> > > > > > > > > > tasks, including basic windowing.
>>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
>>>>> tested on all
>>>>> > > > > > > > Portable
>>>>> > > > > > > > > > runners.
>>>>> > > > > > > > > > > >> > >> The core APIs are not going to change in
>>>>> incompatible
>>>>> > > > > ways
>>>>> > > > > > > > going
>>>>> > > > > > > > > > forward.
>>>>> > > > > > > > > > > >> > >> Scalable transforms can be written through
>>>>> > > > > SplittableDoFns or
>>>>> > > > > > > > > > via Cross Language transforms.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
>>>>> keeping it
>>>>> > > > > > > > experimental
>>>>> > > > > > > > > > doesn't help with that any further.
>>>>> > > > > > > > > > > >> > >> Communities grow through contributions and
>>>>> use, and
>>>>> > > > > > > > experimental
>>>>> > > > > > > > > > markers dissuade users.
>>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand what
>>>>> can be done
>>>>> > > > > with
>>>>> > > > > > > > the
>>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>>> Experimental, it's
>>>>> > > > > > > > because
>>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>>> significantly.
>>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work for
>>>>> users of
>>>>> > > > > the SDK
>>>>> > > > > > > > on
>>>>> > > > > > > > > > every release which leads to sticking to older
>>>>> versions or
>>>>> > > > > forking
>>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>>> should be looked
>>>>> > > > > > > > forward
>>>>> > > > > > > > > > to, and viewed as having little risk. Further while
>>>>> there's been
>>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low
>>>>> bar" is for a
>>>>> > > > > new
>>>>> > > > > > > > SDK, it
>>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel
>>>>> this has
>>>>> > > > > > > > > > > >> > >> hurt development and contribution of new
>>>>> SDK languages
>>>>> > > > > > > > (inherent
>>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>>> entirely clear
>>>>> > > > > what the
>>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>>> language like Go.
>>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0])
>>>>> goes into
>>>>> > > > > detail
>>>>> > > > > > > > what it
>>>>> > > > > > > > > > means for a language without
>>>>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance
>>>>> to implement
>>>>> > > > > the
>>>>> > > > > > > > beam
>>>>> > > > > > > > > > model. One could largely throw away static types
>>>>> (like Python),
>>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It
>>>>> would not do
>>>>> > > > > if the
>>>>> > > > > > > > > > approach couldn't grow and scale to the Beam Model.
>>>>> It's also
>>>>> > > > > hard
>>>>> > > > > > > > > > > >> > >> to tell if an API is any good before there
>>>>> are users.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Further, in the early days of Portability,
>>>>> there
>>>>> > > > > wasn't a
>>>>> > > > > > > > way to
>>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's
>>>>> an
>>>>> > > > > incredible
>>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial
>>>>> fanout of work on
>>>>> > > > > a
>>>>> > > > > > > > single
>>>>> > > > > > > > > > machine, write everything to a Reshuffle, just in
>>>>> order to scale
>>>>> > > > > up.
>>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is
>>>>> little more than
>>>>> > > > > > > > overhead.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> At this point, both of these needs are met
>>>>> within the
>>>>> > > > > Go SDK
>>>>> > > > > > > > for
>>>>> > > > > > > > > > open source.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Background
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam
>>>>> repo for a few
>>>>> > > > > years
>>>>> > > > > > > > now,
>>>>> > > > > > > > > > since it was accidentally merged into master.
>>>>> > > > > > > > > > > >> > >> Since then it's been called experimental,
>>>>> and not
>>>>> > > > > officially
>>>>> > > > > > > > > > part of the releases.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>>> around Beam
>>>>> > > > > Portability
>>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>>> specific )
>>>>> > > > > workers.
>>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos
>>>>> and FnAPI to
>>>>> > > > > > > > execute
>>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>>> Dataflow, but now
>>>>> > > > > > > > > > > >> > >> on all portable supported runners, like
>>>>> Flink, Spark,
>>>>> > > > > the
>>>>> > > > > > > > Python
>>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> API Stability
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed
>>>>> it's user API
>>>>> > > > > for DoFn
>>>>> > > > > > > > > > and pipeline construction since it was first merged
>>>>> in, and
>>>>> > > > > there are
>>>>> > > > > > > > no
>>>>> > > > > > > > > > > >> > >> changes to that on the horizon that can't
>>>>> be made in a
>>>>> > > > > > > > backwards
>>>>> > > > > > > > > > compatible manner. Largely these are related to New
>>>>> Features, or
>>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>>> advent of Go
>>>>> > > > > Generics
>>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>>>>> largely been
>>>>> > > > > under
>>>>> > > > > > > > work
>>>>> > > > > > > > > > for use within Google. It's use is called FlumeGo,
>>>>> representing
>>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of
>>>>> Flume,
>>>>> > > > > Google's
>>>>> > > > > > > > batch
>>>>> > > > > > > > > > pipeline processing engine. Thus most of the focus
>>>>> on improving
>>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>>>>> today, and
>>>>> > > > > there
>>>>> > > > > > > > hasn't
>>>>> > > > > > > > > > been a call for fundamental changes to the API for
>>>>> ergonomic or
>>>>> > > > > > > > > > > >> > >> usability concerns.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Scalability
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Google could get away without the Go SDK
>>>>> having an SDK
>>>>> > > > > side
>>>>> > > > > > > > > > scalability solution as a result of it's integration
>>>>> with Flume.
>>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns
>>>>> along with
>>>>> > > > > Dynamic
>>>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>>>> transforms
>>>>> > > > > natively
>>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>>> Transforms, with
>>>>> > > > > Beam
>>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>>> transforms
>>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who
>>>>> implemented the SDF
>>>>> > > > > side
>>>>> > > > > > > > work,
>>>>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper
>>>>> for the
>>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>>> Transforms, which
>>>>> > > > > is often
>>>>> > > > > > > > > > been requested. This will also enable use of the
>>>>> Beam SQL
>>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Features
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The
>>>>> Go SDK
>>>>> > > > > implements
>>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>>> CombineFns and access
>>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>>> GroupByKey, and
>>>>> > > > > features
>>>>> > > > > > > > like
>>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported
>>>>> for batch even
>>>>> > > > > > > > through
>>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>>>> versatile for
>>>>> > > > > batch
>>>>> > > > > > > > > > execution on portable runners, and for simple
>>>>> streaming
>>>>> > > > > pipelines.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Repo Testing
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit
>>>>> tests. On
>>>>> > > > > top of
>>>>> > > > > > > > > > that, it runs all it's integration tests against the
>>>>> Python
>>>>> > > > > Portable
>>>>> > > > > > > > runner,
>>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>>> breaking changes
>>>>> > > > > without
>>>>> > > > > > > > > > overspending community resources. Those same tests
>>>>> are also
>>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>>> runners via the
>>>>> > > > > > > > appropriate
>>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>>> management server),
>>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up
>>>>> runner
>>>>> > > > > instances for
>>>>> > > > > > > > > > you). Documentation for executing tests and adding
>>>>> new ones
>>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to
>>>>> Go
>>>>> > > > > developers as
>>>>> > > > > > > > > > they're implemented with the standard Go testing
>>>>> tools.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Shortcomings
>>>>> > > > > > > > > > > >> > >> That said, there's still much to do. Let
>>>>> me briefly
>>>>> > > > > tell you
>>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>>> whether they block
>>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>>> implemented as
>>>>> > > > > Splittable
>>>>> > > > > > > > > > DoFn.
>>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it
>>>>> will serve as
>>>>> > > > > a the
>>>>> > > > > > > > > > first example for future contributions for
>>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at
>>>>> this point
>>>>> > > > > users are
>>>>> > > > > > > > > > empowered to write their own DoFns or wrap existing
>>>>> transforms
>>>>> > > > > for
>>>>> > > > > > > > Cross
>>>>> > > > > > > > > > Language use.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>>>>> features have
>>>>> > > > > yet to
>>>>> > > > > > > > be
>>>>> > > > > > > > > > implemented, but they're largely additions to what
>>>>> exists already
>>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the
>>>>> work is
>>>>> > > > > definining
>>>>> > > > > > > > how a
>>>>> > > > > > > > > > user specifies their desires, and turning those into
>>>>> the
>>>>> > > > > appropriate
>>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in
>>>>> October I
>>>>> > > > > wrote at
>>>>> > > > > > > > > > length on the wiki [1] what's missing for additional
>>>>> streaming
>>>>> > > > > > > > features.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>>> recently, there's
>>>>> > > > > likely
>>>>> > > > > > > > > > still more we could test to improve our confidence
>>>>> in the SDK,
>>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>>> transforms
>>>>> > > > > libraries and
>>>>> > > > > > > > > > examples.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Moving Forward
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>>> incorporating the Go
>>>>> > > > > SDK
>>>>> > > > > > > > fully
>>>>> > > > > > > > > > into the Beam Programming Guide. I've audited the
>>>>> guide [3], and
>>>>> > > > > > > > > > > >> > >> am beginning to add missing content and
>>>>> filling in the
>>>>> > > > > Go
>>>>> > > > > > > > > > specific gaps. This will be tied to improving the Go
>>>>> Doc with
>>>>> > > > > more Go
>>>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>>>> appropriate for
>>>>> > > > > the
>>>>> > > > > > > > BPG.
>>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the
>>>>> public
>>>>> > > > > display of
>>>>> > > > > > > > > > that GoDoc.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding
>>>>> vote, I will
>>>>> > > > > > > > > > incorporate the SDK into the release process, and
>>>>> remove the
>>>>> > > > > > > > "experimental"
>>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>>> entails updating
>>>>> > > > > the
>>>>> > > > > > > > > > release scripts to also build and publish the Go SDK
>>>>> Docker
>>>>> > > > > containers.
>>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>>> technically already
>>>>> > > > > doing so
>>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>>>>> however will be
>>>>> > > > > > > > > > migrating the SDK to use Go Modules for dependency
>>>>> version
>>>>> > > > > control,
>>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on
>>>>> after his Kafka
>>>>> > > > > task.
>>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>>> contributors, and
>>>>> > > > > users
>>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>>> dependency
>>>>> > > > > management.
>>>>> > > > > > > > It
>>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on the
>>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions you
>>>>> might have
>>>>> > > > > about
>>>>> > > > > > > > the
>>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>>> intentionally
>>>>> > > > > avoided
>>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they can
>>>>> distract
>>>>> > > > > from the
>>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we need
>>>>> to tell
>>>>> > > > > them that
>>>>> > > > > > > > they
>>>>> > > > > > > > > > can
>>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> Robert Burke
>>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>>> > > > > > > > > > > >> > >>
>>>>> > > > > > > > > > > >> > >> [0]
>>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>>> > > > > > > > > > > >> > >> [1]
>>>>> > > > > > > > > >
>>>>> > > > > > > >
>>>>> > > > >
>>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>>> > > > > > > > > > > >> > >> [2]
>>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>>> > > > > > > > > > > >> > >> [3]
>>>>> > > > > > > > > >
>>>>> > > > > > > >
>>>>> > > > >
>>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>>> > > > > > > > > > (SDK Audit sheet)
>>>>> > > > > > > > > > > >> > >> [4]
>>>>> > > > > > > > > >
>>>>> > > > > > > >
>>>>> > > > >
>>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>>> > > > > > > > > > > >> >
>>>>> > > > > > > > > >
>>>>> > > > > > > > >
>>>>> > > > > > > >
>>>>> > > > > > >
>>>>> > > > > >
>>>>> > > > >
>>>>> > > >
>>>>> > >
>>>>>
>>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <ro...@frantil.com>.
At this point I just need an LGTM on the blog post PR, as the draft is
finalized.

Udi added the sdks/v2.33.0 tag which works as expected. I've also verified
that the appropriate container is used by default when not specified which
is the last unknown in this process.

Who's ready to release a new SDK? I am!

 https://github.com/apache/beam/pull/15894 (or join the exciting reaction
emoji on the top post).



On Wed, Nov 3, 2021, 8:37 PM Robert Burke <ro...@frantil.com> wrote:

> The current draft of the exit blog post is
> https://github.com/apache/beam/pull/15894
> Comments are very welcome. I'm going to continue looking for Known issues
> (which will be linked to their respective JIRAs) tomorrow.
>
> Since RC1 is getting cycled, I can also go back to the original plan of
> v2.33.0, if we'd like to get it out this week.
>
>
> On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:
>
>> Investigation yielded that there's no way around the prefixed tags. The
>> JIRA has been commented with the explanation.
>>
>> https://github.com/apache/beam/pull/15881 has the release script updates.
>>
>> I'm working on the Exit blogpost and the updated Go SDK roadmap. The
>> draft PR will be linked here.
>>
>> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
>> inclined to wait for that release to finish before publishing the blogpost.
>> I'll link the draft PR here as soon as it's ready.
>>
>> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also prefix
>> tagged so there isn't a gap in versions between the unmoduled code and
>> moduled code.
>>
>> Once published,  that'll be the end of this thread.
>>
>> Thank you very much everyone.
>>
>> Robert Burke
>> Beam Go Busybody
>>
>> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com> wrote:
>>
>>> +1 to extra tags. They'll be trivial to add to our release process, and
>>> git tags are lightweight by design so I don't foresee any problems.
>>>
>>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
>>> wrote:
>>>
>>>> Glad you were able to figure it out. The extra tags are certainly
>>>> worth making this work if it's what we have to do, and shouldn't be
>>>> too much of a problem (until, hopefully, it's fixed on the go side).
>>>>
>>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>>> wrote:
>>>> >
>>>> > With Kyle's help with the additional tagging of the next RC, we have
>>>> validated that this is the currently correct approach.
>>>> >
>>>> >
>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>>> >
>>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>>> >
>>>> > Or even:
>>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam
>>>> (links to latest tagged version)
>>>> >
>>>> > The main cost to this approach is doubling the number of tags in the
>>>> tags list: https://github.com/apache/beam/tags which is not ideal, but
>>>> overall a small cost. There's no need for "full publish" of these
>>>> additional tags, so we won't be doubling our "releases" (see
>>>> https://github.com/apache/beam/releases).
>>>> >
>>>> > I'll still be filing a bug against the Go commands since the
>>>> mandatory prefixing is unintuitive, and seems unnecessary. If it becomes
>>>> so, we can always delete the tags from the affected branches, and cease the
>>>> behavior going forward. I'll search through the existing Go issues first
>>>> however to see if this has been previously discussed, and report my
>>>> findings here either way.
>>>> >
>>>> > This does require 2 small changes to release guide: The rc tagging
>>>> script, and the finally tagging:
>>>> >
>>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>>> >
>>>> >
>>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>>> >
>>>> > I'll make this change later this week (or early next) assuming there
>>>> are no objections.
>>>> >
>>>> > Thank you all very much for your patience,
>>>> > Robert Burke
>>>> > Beam Go Busybody
>>>> >
>>>> >
>>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>>>> > > With much research in reading the Go Modules documentation, I have
>>>> confirmed what the issue is.
>>>> > >
>>>> > > We added the go.mod file to sdks/ under the repo root because it's
>>>> a cleaner spot for the change, captures the Java and Python container boot
>>>> code (written in Go) into the module and avoids conflicts in
>>>> interpretations of the vendor directory that lives at the root level.
>>>> > >
>>>> > > However, we missed that when doing so, the standard version tags
>>>> would only apply to modules at the root level, not at modules in
>>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but
>>>> quoting the important paragraph:
>>>> > >
>>>> > > > If a module is defined in a subdirectory within the repository,
>>>> that is, the module subdirectory portion of
>>>> > > > the module path is not empty, then each tag name must be prefixed
>>>> with the module subdirectory,
>>>> > > > followed by a slash. For example, the module
>>>> golang.org/x/tools/gopls is defined in the gopls
>>>> > > > subdirectory of the repository with root path golang.org/x/tools.
>>>> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in
>>>> that repository.
>>>> > >
>>>> > > Specifically, for the Go SDK to be able to be fetched at the right
>>>> version, we need to have prefixed tags like "sdks/v2.33.0" or
>>>> "sdks/v2.34.0-RC1"
>>>> > >
>>>> > > So, the fix for the Go versioning issue is to amend our Release
>>>> process (including generating Release Candidate builds) to also add a
>>>> prefixed version tag with the same version.
>>>> > >
>>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there
>>>> are no objections we can back update the 2.33.0 release branch with such a
>>>> prefixed tag. At which point I can also write the Official Experiemental
>>>> Exit Blog post.
>>>> > >
>>>> > > Thank you all for your patience.
>>>> > > Robert Burke
>>>> > >
>>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>>> > > > Thank you for the detailed update! Let us know if we can help.
>>>> > > >
>>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org>
>>>> wrote:
>>>> > > >
>>>> > > > > This is a status update.
>>>> > > > >
>>>> > > > > At this point 2.33.0 is released, but there are difficulties
>>>> with
>>>> > > > > accessing the tagged versions using the standard go tools. It's
>>>> currently
>>>> > > > > under investigation.
>>>> > > > >
>>>> > > > > Using the v2 path in a go program then running `go mod tidy`
>>>> will populate
>>>> > > > > the file with  a pseudo-version rather than the latest tag
>>>> (v2.33.0)  (eg
>>>> > > > > the line looks like
>>>> > > > > require github.com/apache/beam/sdks/v2
>>>> v2.0.0-20211013181004-a9120e083008
>>>> > > > > )
>>>> > > > >
>>>> > > > > While this will work, it's not the desired experience for users
>>>> at this
>>>> > > > > point. Current downside is that the releases are not meaningful
>>>> targets for
>>>> > > > > some reason. However, we retain the other benefits of Go
>>>> Modules (actual
>>>> > > > > dependency versioning, management by go tools).
>>>> > > > >
>>>> > > > > The issue is some combination of the go tooling [A] , that we
>>>> added a go
>>>> > > > > mod file outside of the repo root [B], and that we did not
>>>> increment the
>>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>>> > > > >
>>>> > > > > [B] From the go documentation, this should be legal and fine,
>>>> even if it's
>>>> > > > > not recommended. This is fortunate because the root of the repo
>>>> would have
>>>> > > > > played poorly with root vendor directory, which the go tools
>>>> have opinions
>>>> > > > > on.
>>>> > > > >
>>>> > > > > [C] Incrementing the major version is recommended,in the Go
>>>> Modules
>>>> > > > > documentation, when transitioning to Go Modules. However, it
>>>> never said it
>>>> > > > > was required, nor did it indicate this current failure mode. If
>>>> anything
>>>> > > > > this should be documented in those docs, if it's not another
>>>> bug. We would
>>>> > > > > not necessarily want to declare a global v3 for beam at this
>>>> time, for just
>>>> > > > > the Go SDK, it would become confusing rather quickly.
>>>> Notionally there are
>>>> > > > > some larger breaking changes the Java and Python SDKs would
>>>> want to make in
>>>> > > > > such an event, and thus it's a larger conversation, that is out
>>>> of scope at
>>>> > > > > this time.
>>>> > > > >
>>>> > > > > This leaves [A] where some mis-understanding of the documented
>>>> semantics
>>>> > > > > occurred. I certainly expected the tagged version of the
>>>> non-root go-module
>>>> > > > > to be inherited from the parent, not wholesale ignored. As a
>>>> result, I'll
>>>> > > > > be filing a bug against the go tools to determine this, and see
>>>> what paths
>>>> > > > > forward exist.
>>>> > > > >
>>>> > > > > It's my hope to resolve this before we write a properly
>>>> Experimental Exit
>>>> > > > > blog post for the Go SDK.
>>>> > > > >
>>>> > > > > Thank you for your patience, and time.
>>>> > > > > Robert Burke
>>>> > > > > Beam Go Busybody
>>>> > > > >
>>>> > > > >
>>>> > > > >
>>>> > > > >
>>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>>> wrote:
>>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK
>>>> now uses Go
>>>> > > > > Modules for dependency management, simplifying Go SDK
>>>> contributions. [2]
>>>> > > > > >
>>>> > > > > > The Module file lives in the sdks/ directory so there's a
>>>> single Go
>>>> > > > > Module for the whole SDK, tests, examples, and any support code
>>>> for the
>>>> > > > > container boot builds. This excludes the Go SDK Code katas [3]
>>>> go modules
>>>> > > > > which can be updated once 2.33.0 has been released.
>>>> > > > > >
>>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>>> builds, and
>>>> > > > > default uses the release specific container for docker
>>>> execution jobs. For
>>>> > > > > at least the 2.33.0 release this does mean that  manual
>>>> validation will
>>>> > > > > need to explictly specify RC versions of containers. However,
>>>> given that
>>>> > > > > the Go SDK container and worker boot process rarely changes,
>>>> this is
>>>> > > > > unlikely to be an issue.
>>>> > > > > >
>>>> > > > > > At present I'm cleaning up some of the references to
>>>> experimental, and
>>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>>> release (even
>>>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md
>>>> will be
>>>> > > > > updated to note the event, but a larger blogpost will happen
>>>> after the
>>>> > > > > release goes public.
>>>> > > > > >
>>>> > > > > > Cheers,
>>>> > > > > > Robert Burke
>>>> > > > > > Defacto Beam Go TL.
>>>> > > > > >
>>>> > > > > > [1]
>>>> > > > >
>>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>>> > > > > > [3]
>>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>>> > > > > >
>>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
>>>> > > > > > > +1, congratulations & thank you!
>>>> > > > > > >
>>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>>> lostluck@apache.org>
>>>> > > > > wrote:
>>>> > > > > > >
>>>> > > > > > > > Regarding documentation update: Initial PR is
>>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up
>>>> to section
>>>> > > > > ~4.3.
>>>> > > > > > > > JIRA link for Programing Guide changes:
>>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>>> > > > > > > >
>>>> > > > > > > >
>>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com>
>>>> wrote:
>>>> > > > > > > > > Yup!
>>>> > > > > > > > >
>>>> > > > > > > > > My immediate plan is to work on incorporating the Go
>>>> SDK fully
>>>> > > > > into the
>>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>>> > > > > > > > > am beginning to add missing content and filling in the
>>>> Go specific
>>>> > > > > gaps.
>>>> > > > > > > > > This will be tied to improving the Go Doc with more Go
>>>> > > > > > > > > specific user documentation that isn't appropriate for
>>>> the BPG.
>>>> > > > > > > > >
>>>> > > > > > > > > My audit of the guide is here:
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > >
>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>> > > > > > > > >
>>>> > > > > > > > > The other sheets focus on features and tests. The
>>>> feature page
>>>> > > > > looks
>>>> > > > > > > > worse
>>>> > > > > > > > > than it is, as it was more productive to focus on what
>>>> isn't
>>>> > > > > available
>>>> > > > > > > > than
>>>> > > > > > > > > what is. That's a snapshot of my actual working sheet
>>>> but I'll be
>>>> > > > > > > > updating
>>>> > > > > > > > > it as needed.
>>>> > > > > > > > >
>>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>>> iemejia@gmail.com>
>>>> > > > > wrote:
>>>> > > > > > > > >
>>>> > > > > > > > > > Oups forgot to write one question. Will this come
>>>> with revamped
>>>> > > > > > > > > > website instructions/doc for golang too?
>>>> > > > > > > > > >
>>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>>> iemejia@gmail.com>
>>>> > > > > > > > wrote:
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > Huge +1
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > This is definitely something many people have asked
>>>> about, so
>>>> > > > > it is
>>>> > > > > > > > > > > great to see it finally happening.
>>>> > > > > > > > > > >
>>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
>>>> > > > > kenn@apache.org>
>>>> > > > > > > > wrote:
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > +1 awesome
>>>> > > > > > > > > > > >
>>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
>>>> > > > > lostluck@apache.org
>>>> > > > > > > > >
>>>> > > > > > > > > > wrote:
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to
>>>> get those (Go
>>>> > > > > > > > modules
>>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>>> certainly
>>>> > > > > before the
>>>> > > > > > > > 2.33
>>>> > > > > > > > > > cut if release images aren't added to the 2.32
>>>> process.
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>>> future, we may
>>>> > > > > want a
>>>> > > > > > > > > > harder break between a newer Generic first API and
>>>> and the
>>>> > > > > current
>>>> > > > > > > > version,
>>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go
>>>> aren't
>>>> > > > > identical to
>>>> > > > > > > > the
>>>> > > > > > > > > > feature referred to by that term in Java, C++, Rust,
>>>> etc, so
>>>> > > > > it'll
>>>> > > > > > > > take a
>>>> > > > > > > > > > bit of time for that expertise to develop.
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> However, by the current nature of Go, we had to
>>>> have pretty
>>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns and
>>>> map them
>>>> > > > > to their
>>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>>> emitter, and
>>>> > > > > Iterator
>>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
>>>> internals to
>>>> > > > > use
>>>> > > > > > > > > > generics (like the implementation of Stats DoFns like
>>>> Min, Max,
>>>> > > > > etc)
>>>> > > > > > > > would
>>>> > > > > > > > > > also be able to be made transparently to most users,
>>>> and
>>>> > > > > certainly any
>>>> > > > > > > > of
>>>> > > > > > > > > > the framework for execution time handling (the
>>>> "worker's SDK
>>>> > > > > harness")
>>>> > > > > > > > > > would be able to be cleaned up if need be. Finally,
>>>> adding more
>>>> > > > > > > > > > sophisticated DoFn registration and code generation
>>>> would be
>>>> > > > > able to
>>>> > > > > > > > > > replace the optional code generator entirely, saving
>>>> some users
>>>> > > > > a `go
>>>> > > > > > > > > > generate` step, simplifying getting improved execution
>>>> > > > > performance.
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> Changing things like making a Type Parameterized
>>>> > > > > PCollection,
>>>> > > > > > > > would
>>>> > > > > > > > > > be far more involved, as would trying to use some
>>>> kind of Apply
>>>> > > > > > > > format. The
>>>> > > > > > > > > > lack of Method Overrides prevents the apply chaining
>>>> approach.
>>>> > > > > Or at
>>>> > > > > > > > least
>>>> > > > > > > > > > prevents it from working simply.
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> Finally, Go Generics won't be available until Go
>>>> 1.18,
>>>> > > > > which isn't
>>>> > > > > > > > > > until next year. See
>>>> https://blog.golang.org/generics-proposal
>>>> > > > > for
>>>> > > > > > > > > > details.
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does
>>>> include a
>>>> > > > > Register
>>>> > > > > > > > > > calling convention, leading to a modest performance
>>>> improvement
>>>> > > > > across
>>>> > > > > > > > the
>>>> > > > > > > > > > board.
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> Cheers,
>>>> > > > > > > > > > > >> Robert Burke
>>>> > > > > > > > > > > >>
>>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>>> > > > > robertwb@google.com>
>>>> > > > > > > > wrote:
>>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>>> experimental once
>>>> > > > > the Go
>>>> > > > > > > > > > Modules
>>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs
>>>> to support
>>>> > > > > every
>>>> > > > > > > > > > feature
>>>> > > > > > > > > > > >> > to be accepted, especially now that we can do
>>>> > > > > cross-language
>>>> > > > > > > > > > > >> > transforms, and Go definitely supports enough
>>>> to be quite
>>>> > > > > > > > useful.
>>>> > > > > > > > > > (WRT
>>>> > > > > > > > > > > >> > streaming, my understanding is that Go
>>>> supports the
>>>> > > > > streaming
>>>> > > > > > > > model
>>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine on
>>>> a streaming
>>>> > > > > > > > runner,
>>>> > > > > > > > > > even
>>>> > > > > > > > > > > >> > if more advanced features like state and
>>>> timers aren't yet
>>>> > > > > > > > > > available.)
>>>> > > > > > > > > > > >> >
>>>> > > > > > > > > > > >> > This is a great milestone.
>>>> > > > > > > > > > > >> >
>>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson
>>>> Hamilton <
>>>> > > > > > > > tysonjh@google.com>
>>>> > > > > > > > > > wrote:
>>>> > > > > > > > > > > >> > >
>>>> > > > > > > > > > > >> > > WOW! Big news.
>>>> > > > > > > > > > > >> > >
>>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental
>>>> status after Go
>>>> > > > > Modules
>>>> > > > > > > > > > are completed and the LICENSE issue is resolved. I
>>>> don't think
>>>> > > > > that
>>>> > > > > > > > lacking
>>>> > > > > > > > > > streaming support is a blocker. The other thing I
>>>> checked to see
>>>> > > > > was if
>>>> > > > > > > > > > there were metrics available on
>>>> metrics.beam.apache.org,
>>>> > > > > specifically
>>>> > > > > > > > for
>>>> > > > > > > > > > measuring code health via post-commit over time,
>>>> which there are
>>>> > > > > and
>>>> > > > > > > > the
>>>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing
>>>> that
>>>> > > > > surprised me
>>>> > > > > > > > from
>>>> > > > > > > > > > your summary is that when Go introduces generics it
>>>> won't result
>>>> > > > > in any
>>>> > > > > > > > > > backwards incompatible changes in Apache Beam. That's
>>>> great
>>>> > > > > news, but
>>>> > > > > > > > does
>>>> > > > > > > > > > it mean there will be a need to support both
>>>> non-generic and
>>>> > > > > generic
>>>> > > > > > > > APIs
>>>> > > > > > > > > > moving forward? It seems like generics will be
>>>> introduced in the
>>>> > > > > Go
>>>> > > > > > > > 1.17
>>>> > > > > > > > > > release (optimistically) in August this year.
>>>> > > > > > > > > > > >> > >
>>>> > > > > > > > > > > >> > >
>>>> > > > > > > > > > > >> > >
>>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke
>>>> <
>>>> > > > > > > > lostluck@apache.org>
>>>> > > > > > > > > > wrote:
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam
>>>> Go SDK
>>>> > > > > > > > experimental.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> This thread is to discuss it as a
>>>> community, and any
>>>> > > > > > > > conditions
>>>> > > > > > > > > > that remain that would prevent the exit.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> tl;dr;
>>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I have
>>>> both.
>>>> > > > > > > > > > > >> > >> This entails including it officially in the
>>>> Release
>>>> > > > > process,
>>>> > > > > > > > > > removing the various "experimental" text throughout
>>>> the repo etc,
>>>> > > > > > > > > > > >> > >> and otherwise treating it like Python and
>>>> Java. Some Go
>>>> > > > > > > > specific
>>>> > > > > > > > > > tasks around dep versioning.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>>> efficiently for
>>>> > > > > most
>>>> > > > > > > > batch
>>>> > > > > > > > > > tasks, including basic windowing.
>>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
>>>> tested on all
>>>> > > > > > > > Portable
>>>> > > > > > > > > > runners.
>>>> > > > > > > > > > > >> > >> The core APIs are not going to change in
>>>> incompatible
>>>> > > > > ways
>>>> > > > > > > > going
>>>> > > > > > > > > > forward.
>>>> > > > > > > > > > > >> > >> Scalable transforms can be written through
>>>> > > > > SplittableDoFns or
>>>> > > > > > > > > > via Cross Language transforms.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
>>>> keeping it
>>>> > > > > > > > experimental
>>>> > > > > > > > > > doesn't help with that any further.
>>>> > > > > > > > > > > >> > >> Communities grow through contributions and
>>>> use, and
>>>> > > > > > > > experimental
>>>> > > > > > > > > > markers dissuade users.
>>>> > > > > > > > > > > >> > >> There's plenty to do in order expand what
>>>> can be done
>>>> > > > > with
>>>> > > > > > > > the
>>>> > > > > > > > > > SDK. (Contributions welcome)
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>>> Experimental, it's
>>>> > > > > > > > because
>>>> > > > > > > > > > there's a risk that API or behaviors may change
>>>> significantly.
>>>> > > > > > > > > > > >> > >> This in turn, leads to additional work for
>>>> users of
>>>> > > > > the SDK
>>>> > > > > > > > on
>>>> > > > > > > > > > every release which leads to sticking to older
>>>> versions or
>>>> > > > > forking
>>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates
>>>> should be looked
>>>> > > > > > > > forward
>>>> > > > > > > > > > to, and viewed as having little risk. Further while
>>>> there's been
>>>> > > > > > > > > > > >> > >> previous dicussion about what the "low bar"
>>>> is for a
>>>> > > > > new
>>>> > > > > > > > SDK, it
>>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel
>>>> this has
>>>> > > > > > > > > > > >> > >> hurt development and contribution of new
>>>> SDK languages
>>>> > > > > > > > (inherent
>>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>>> entirely clear
>>>> > > > > what the
>>>> > > > > > > > > > Beam Model should look like in an opinionated
>>>> language like Go.
>>>> > > > > > > > > > > >> > >> Their initial take (see
>>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0])
>>>> goes into
>>>> > > > > detail
>>>> > > > > > > > what it
>>>> > > > > > > > > > means for a language without
>>>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance to
>>>> implement
>>>> > > > > the
>>>> > > > > > > > beam
>>>> > > > > > > > > > model. One could largely throw away static types
>>>> (like Python),
>>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It
>>>> would not do
>>>> > > > > if the
>>>> > > > > > > > > > approach couldn't grow and scale to the Beam Model.
>>>> It's also
>>>> > > > > hard
>>>> > > > > > > > > > > >> > >> to tell if an API is any good before there
>>>> are users.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Further, in the early days of Portability,
>>>> there
>>>> > > > > wasn't a
>>>> > > > > > > > way to
>>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's
>>>> an
>>>> > > > > incredible
>>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial fanout
>>>> of work on
>>>> > > > > a
>>>> > > > > > > > single
>>>> > > > > > > > > > machine, write everything to a Reshuffle, just in
>>>> order to scale
>>>> > > > > up.
>>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is little
>>>> more than
>>>> > > > > > > > overhead.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> At this point, both of these needs are met
>>>> within the
>>>> > > > > Go SDK
>>>> > > > > > > > for
>>>> > > > > > > > > > open source.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Background
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam repo
>>>> for a few
>>>> > > > > years
>>>> > > > > > > > now,
>>>> > > > > > > > > > since it was accidentally merged into master.
>>>> > > > > > > > > > > >> > >> Since then it's been called experimental,
>>>> and not
>>>> > > > > officially
>>>> > > > > > > > > > part of the releases.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed
>>>> around Beam
>>>> > > > > Portability
>>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>>> specific )
>>>> > > > > workers.
>>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos
>>>> and FnAPI to
>>>> > > > > > > > execute
>>>> > > > > > > > > > jobs, first with some very experimental code on
>>>> Dataflow, but now
>>>> > > > > > > > > > > >> > >> on all portable supported runners, like
>>>> Flink, Spark,
>>>> > > > > the
>>>> > > > > > > > Python
>>>> > > > > > > > > > Portable runner, and Dataflow.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> API Stability
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's
>>>> user API
>>>> > > > > for DoFn
>>>> > > > > > > > > > and pipeline construction since it was first merged
>>>> in, and
>>>> > > > > there are
>>>> > > > > > > > no
>>>> > > > > > > > > > > >> > >> changes to that on the horizon that can't
>>>> be made in a
>>>> > > > > > > > backwards
>>>> > > > > > > > > > compatible manner. Largely these are related to New
>>>> Features, or
>>>> > > > > > > > > > > >> > >> usability improvements enabled by the
>>>> advent of Go
>>>> > > > > Generics
>>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>>>> largely been
>>>> > > > > under
>>>> > > > > > > > work
>>>> > > > > > > > > > for use within Google. It's use is called FlumeGo,
>>>> representing
>>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of
>>>> Flume,
>>>> > > > > Google's
>>>> > > > > > > > batch
>>>> > > > > > > > > > pipeline processing engine. Thus most of the focus on
>>>> improving
>>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>>>> today, and
>>>> > > > > there
>>>> > > > > > > > hasn't
>>>> > > > > > > > > > been a call for fundamental changes to the API for
>>>> ergonomic or
>>>> > > > > > > > > > > >> > >> usability concerns.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Scalability
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Google could get away without the Go SDK
>>>> having an SDK
>>>> > > > > side
>>>> > > > > > > > > > scalability solution as a result of it's integration
>>>> with Flume.
>>>> > > > > > > > > > > >> > >> However, those days are now past.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns
>>>> along with
>>>> > > > > Dynamic
>>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>>> transforms
>>>> > > > > natively
>>>> > > > > > > > > > > >> > >> in the Go SDK.
>>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>>> Transforms, with
>>>> > > > > Beam
>>>> > > > > > > > > > Schema encodings. With it, production hardened
>>>> transforms
>>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented
>>>> the SDF
>>>> > > > > side
>>>> > > > > > > > work,
>>>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper
>>>> for the
>>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>>> Transforms, which
>>>> > > > > is often
>>>> > > > > > > > > > been requested. This will also enable use of the Beam
>>>> SQL
>>>> > > > > > > > > > > >> > >> transforms that java enables.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Features
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The
>>>> Go SDK
>>>> > > > > implements
>>>> > > > > > > > > > standard coders, allows for user DoFns, and
>>>> CombineFns and access
>>>> > > > > > > > > > > >> > >> to core transforms like Flatten,
>>>> GroupByKey, and
>>>> > > > > features
>>>> > > > > > > > like
>>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported for
>>>> batch even
>>>> > > > > > > > through
>>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>>> versatile for
>>>> > > > > batch
>>>> > > > > > > > > > execution on portable runners, and for simple
>>>> streaming
>>>> > > > > pipelines.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Repo Testing
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit
>>>> tests. On
>>>> > > > > top of
>>>> > > > > > > > > > that, it runs all it's integration tests against the
>>>> Python
>>>> > > > > Portable
>>>> > > > > > > > runner,
>>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>>> breaking changes
>>>> > > > > without
>>>> > > > > > > > > > overspending community resources. Those same tests
>>>> are also
>>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The tests are executable against all
>>>> runners via the
>>>> > > > > > > > appropriate
>>>> > > > > > > > > > Go commands (if you've stood up your own job
>>>> management server),
>>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up
>>>> runner
>>>> > > > > instances for
>>>> > > > > > > > > > you). Documentation for executing tests and adding
>>>> new ones
>>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to
>>>> Go
>>>> > > > > developers as
>>>> > > > > > > > > > they're implemented with the standard Go testing
>>>> tools.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Shortcomings
>>>> > > > > > > > > > > >> > >> That said, there's still much to do. Let me
>>>> briefly
>>>> > > > > tell you
>>>> > > > > > > > > > what doesn't work, and it's up to you to weigh
>>>> whether they block
>>>> > > > > > > > > > > >> > >> being out of experimental.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>>> implemented as
>>>> > > > > Splittable
>>>> > > > > > > > > > DoFn.
>>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it
>>>> will serve as
>>>> > > > > a the
>>>> > > > > > > > > > first example for future contributions for
>>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at this
>>>> point
>>>> > > > > users are
>>>> > > > > > > > > > empowered to write their own DoFns or wrap existing
>>>> transforms
>>>> > > > > for
>>>> > > > > > > > Cross
>>>> > > > > > > > > > Language use.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>>>> features have
>>>> > > > > yet to
>>>> > > > > > > > be
>>>> > > > > > > > > > implemented, but they're largely additions to what
>>>> exists already
>>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the
>>>> work is
>>>> > > > > definining
>>>> > > > > > > > how a
>>>> > > > > > > > > > user specifies their desires, and turning those into
>>>> the
>>>> > > > > appropriate
>>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in
>>>> October I
>>>> > > > > wrote at
>>>> > > > > > > > > > length on the wiki [1] what's missing for additional
>>>> streaming
>>>> > > > > > > > features.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>>> recently, there's
>>>> > > > > likely
>>>> > > > > > > > > > still more we could test to improve our confidence in
>>>> the SDK,
>>>> > > > > > > > > > > >> > >> in particular regarding the included
>>>> transforms
>>>> > > > > libraries and
>>>> > > > > > > > > > examples.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Moving Forward
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>>> incorporating the Go
>>>> > > > > SDK
>>>> > > > > > > > fully
>>>> > > > > > > > > > into the Beam Programming Guide. I've audited the
>>>> guide [3], and
>>>> > > > > > > > > > > >> > >> am beginning to add missing content and
>>>> filling in the
>>>> > > > > Go
>>>> > > > > > > > > > specific gaps. This will be tied to improving the Go
>>>> Doc with
>>>> > > > > more Go
>>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>>> appropriate for
>>>> > > > > the
>>>> > > > > > > > BPG.
>>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the
>>>> public
>>>> > > > > display of
>>>> > > > > > > > > > that GoDoc.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding
>>>> vote, I will
>>>> > > > > > > > > > incorporate the SDK into the release process, and
>>>> remove the
>>>> > > > > > > > "experimental"
>>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>>> entails updating
>>>> > > > > the
>>>> > > > > > > > > > release scripts to also build and publish the Go SDK
>>>> Docker
>>>> > > > > containers.
>>>> > > > > > > > > > > >> > >> As for releasing the code, we're
>>>> technically already
>>>> > > > > doing so
>>>> > > > > > > > > > whenever we tag a release branch [4].
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>>>> however will be
>>>> > > > > > > > > > migrating the SDK to use Go Modules for dependency
>>>> version
>>>> > > > > control,
>>>> > > > > > > > > > > >> > >> which Daniel is planning on working on
>>>> after his Kafka
>>>> > > > > task.
>>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>>> contributors, and
>>>> > > > > users
>>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>>> dependency
>>>> > > > > management.
>>>> > > > > > > > It
>>>> > > > > > > > > > will remove the "+incompatible" tags one sees on the
>>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions you
>>>> might have
>>>> > > > > about
>>>> > > > > > > > the
>>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>>> intentionally
>>>> > > > > avoided
>>>> > > > > > > > > > > >> > >> a link barrage in this email, as they can
>>>> distract
>>>> > > > > from the
>>>> > > > > > > > > > point: The SDK is ready for folks to use it, we need
>>>> to tell
>>>> > > > > them that
>>>> > > > > > > > they
>>>> > > > > > > > > > can
>>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> Robert Burke
>>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>>> > > > > > > > > > > >> > >>
>>>> > > > > > > > > > > >> > >> [0]
>>>> https://s.apache.org/beam-go-sdk-design-rfc
>>>> > > > > > > > > > > >> > >> [1]
>>>> > > > > > > > > >
>>>> > > > > > > >
>>>> > > > >
>>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>>> > > > > > > > > > > >> > >> [2]
>>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>>> > > > > > > > > > > >> > >> [3]
>>>> > > > > > > > > >
>>>> > > > > > > >
>>>> > > > >
>>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>>> > > > > > > > > > (SDK Audit sheet)
>>>> > > > > > > > > > > >> > >> [4]
>>>> > > > > > > > > >
>>>> > > > > > > >
>>>> > > > >
>>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>>> > > > > > > > > > > >> >
>>>> > > > > > > > > >
>>>> > > > > > > > >
>>>> > > > > > > >
>>>> > > > > > >
>>>> > > > > >
>>>> > > > >
>>>> > > >
>>>> > >
>>>>
>>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <ro...@frantil.com>.
The current draft of the exit blog post is
https://github.com/apache/beam/pull/15894
Comments are very welcome. I'm going to continue looking for Known issues
(which will be linked to their respective JIRAs) tomorrow.

Since RC1 is getting cycled, I can also go back to the original plan of
v2.33.0, if we'd like to get it out this week.


On Wed, 3 Nov 2021 at 10:17, Robert Burke <ro...@frantil.com> wrote:

> Investigation yielded that there's no way around the prefixed tags. The
> JIRA has been commented with the explanation.
>
> https://github.com/apache/beam/pull/15881 has the release script updates.
>
> I'm working on the Exit blogpost and the updated Go SDK roadmap. The draft
> PR will be linked here.
>
> Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
> inclined to wait for that release to finish before publishing the blogpost.
> I'll link the draft PR here as soon as it's ready.
>
> Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also prefix
> tagged so there isn't a gap in versions between the unmoduled code and
> moduled code.
>
> Once published,  that'll be the end of this thread.
>
> Thank you very much everyone.
>
> Robert Burke
> Beam Go Busybody
>
> On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com> wrote:
>
>> +1 to extra tags. They'll be trivial to add to our release process, and
>> git tags are lightweight by design so I don't foresee any problems.
>>
>> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
>> wrote:
>>
>>> Glad you were able to figure it out. The extra tags are certainly
>>> worth making this work if it's what we have to do, and shouldn't be
>>> too much of a problem (until, hopefully, it's fixed on the go side).
>>>
>>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org>
>>> wrote:
>>> >
>>> > With Kyle's help with the additional tagging of the next RC, we have
>>> validated that this is the currently correct approach.
>>> >
>>> >
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>>> >
>>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>>> >
>>> > Or even:
>>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam  (links
>>> to latest tagged version)
>>> >
>>> > The main cost to this approach is doubling the number of tags in the
>>> tags list: https://github.com/apache/beam/tags which is not ideal, but
>>> overall a small cost. There's no need for "full publish" of these
>>> additional tags, so we won't be doubling our "releases" (see
>>> https://github.com/apache/beam/releases).
>>> >
>>> > I'll still be filing a bug against the Go commands since the mandatory
>>> prefixing is unintuitive, and seems unnecessary. If it becomes so, we can
>>> always delete the tags from the affected branches, and cease the behavior
>>> going forward. I'll search through the existing Go issues first however to
>>> see if this has been previously discussed, and report my findings here
>>> either way.
>>> >
>>> > This does require 2 small changes to release guide: The rc tagging
>>> script, and the finally tagging:
>>> >
>>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>>> >
>>> >
>>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>>> >
>>> > I'll make this change later this week (or early next) assuming there
>>> are no objections.
>>> >
>>> > Thank you all very much for your patience,
>>> > Robert Burke
>>> > Beam Go Busybody
>>> >
>>> >
>>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>>> > > With much research in reading the Go Modules documentation, I have
>>> confirmed what the issue is.
>>> > >
>>> > > We added the go.mod file to sdks/ under the repo root because it's a
>>> cleaner spot for the change, captures the Java and Python container boot
>>> code (written in Go) into the module and avoids conflicts in
>>> interpretations of the vendor directory that lives at the root level.
>>> > >
>>> > > However, we missed that when doing so, the standard version tags
>>> would only apply to modules at the root level, not at modules in
>>> subdirectories. See https://golang.org/ref/mod#vcs-version, but quoting
>>> the important paragraph:
>>> > >
>>> > > > If a module is defined in a subdirectory within the repository,
>>> that is, the module subdirectory portion of
>>> > > > the module path is not empty, then each tag name must be prefixed
>>> with the module subdirectory,
>>> > > > followed by a slash. For example, the module
>>> golang.org/x/tools/gopls is defined in the gopls
>>> > > > subdirectory of the repository with root path golang.org/x/tools.
>>> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in
>>> that repository.
>>> > >
>>> > > Specifically, for the Go SDK to be able to be fetched at the right
>>> version, we need to have prefixed tags like "sdks/v2.33.0" or
>>> "sdks/v2.34.0-RC1"
>>> > >
>>> > > So, the fix for the Go versioning issue is to amend our Release
>>> process (including generating Release Candidate builds) to also add a
>>> prefixed version tag with the same version.
>>> > >
>>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there
>>> are no objections we can back update the 2.33.0 release branch with such a
>>> prefixed tag. At which point I can also write the Official Experiemental
>>> Exit Blog post.
>>> > >
>>> > > Thank you all for your patience.
>>> > > Robert Burke
>>> > >
>>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>>> > > > Thank you for the detailed update! Let us know if we can help.
>>> > > >
>>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org>
>>> wrote:
>>> > > >
>>> > > > > This is a status update.
>>> > > > >
>>> > > > > At this point 2.33.0 is released, but there are difficulties with
>>> > > > > accessing the tagged versions using the standard go tools. It's
>>> currently
>>> > > > > under investigation.
>>> > > > >
>>> > > > > Using the v2 path in a go program then running `go mod tidy`
>>> will populate
>>> > > > > the file with  a pseudo-version rather than the latest tag
>>> (v2.33.0)  (eg
>>> > > > > the line looks like
>>> > > > > require github.com/apache/beam/sdks/v2
>>> v2.0.0-20211013181004-a9120e083008
>>> > > > > )
>>> > > > >
>>> > > > > While this will work, it's not the desired experience for users
>>> at this
>>> > > > > point. Current downside is that the releases are not meaningful
>>> targets for
>>> > > > > some reason. However, we retain the other benefits of Go Modules
>>> (actual
>>> > > > > dependency versioning, management by go tools).
>>> > > > >
>>> > > > > The issue is some combination of the go tooling [A] , that we
>>> added a go
>>> > > > > mod file outside of the repo root [B], and that we did not
>>> increment the
>>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>>> > > > >
>>> > > > > [B] From the go documentation, this should be legal and fine,
>>> even if it's
>>> > > > > not recommended. This is fortunate because the root of the repo
>>> would have
>>> > > > > played poorly with root vendor directory, which the go tools
>>> have opinions
>>> > > > > on.
>>> > > > >
>>> > > > > [C] Incrementing the major version is recommended,in the Go
>>> Modules
>>> > > > > documentation, when transitioning to Go Modules. However, it
>>> never said it
>>> > > > > was required, nor did it indicate this current failure mode. If
>>> anything
>>> > > > > this should be documented in those docs, if it's not another
>>> bug. We would
>>> > > > > not necessarily want to declare a global v3 for beam at this
>>> time, for just
>>> > > > > the Go SDK, it would become confusing rather quickly. Notionally
>>> there are
>>> > > > > some larger breaking changes the Java and Python SDKs would want
>>> to make in
>>> > > > > such an event, and thus it's a larger conversation, that is out
>>> of scope at
>>> > > > > this time.
>>> > > > >
>>> > > > > This leaves [A] where some mis-understanding of the documented
>>> semantics
>>> > > > > occurred. I certainly expected the tagged version of the
>>> non-root go-module
>>> > > > > to be inherited from the parent, not wholesale ignored. As a
>>> result, I'll
>>> > > > > be filing a bug against the go tools to determine this, and see
>>> what paths
>>> > > > > forward exist.
>>> > > > >
>>> > > > > It's my hope to resolve this before we write a properly
>>> Experimental Exit
>>> > > > > blog post for the Go SDK.
>>> > > > >
>>> > > > > Thank you for your patience, and time.
>>> > > > > Robert Burke
>>> > > > > Beam Go Busybody
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > >
>>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org>
>>> wrote:
>>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK
>>> now uses Go
>>> > > > > Modules for dependency management, simplifying Go SDK
>>> contributions. [2]
>>> > > > > >
>>> > > > > > The Module file lives in the sdks/ directory so there's a
>>> single Go
>>> > > > > Module for the whole SDK, tests, examples, and any support code
>>> for the
>>> > > > > container boot builds. This excludes the Go SDK Code katas [3]
>>> go modules
>>> > > > > which can be updated once 2.33.0 has been released.
>>> > > > > >
>>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>>> builds, and
>>> > > > > default uses the release specific container for docker execution
>>> jobs. For
>>> > > > > at least the 2.33.0 release this does mean that  manual
>>> validation will
>>> > > > > need to explictly specify RC versions of containers. However,
>>> given that
>>> > > > > the Go SDK container and worker boot process rarely changes,
>>> this is
>>> > > > > unlikely to be an issue.
>>> > > > > >
>>> > > > > > At present I'm cleaning up some of the references to
>>> experimental, and
>>> > > > > making it clear that 2.33.0 is the first non-experimental
>>> release (even
>>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md
>>> will be
>>> > > > > updated to note the event, but a larger blogpost will happen
>>> after the
>>> > > > > release goes public.
>>> > > > > >
>>> > > > > > Cheers,
>>> > > > > > Robert Burke
>>> > > > > > Defacto Beam Go TL.
>>> > > > > >
>>> > > > > > [1]
>>> > > > >
>>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>>> > > > > > [2] https://github.com/apache/beam/pull/15323
>>> > > > > > [3]
>>> https://github.com/apache/beam/tree/master/learning/katas/go
>>> > > > > > [4] https://github.com/apache/beam/pull/15365
>>> > > > > >
>>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
>>> > > > > > > +1, congratulations & thank you!
>>> > > > > > >
>>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>>> lostluck@apache.org>
>>> > > > > wrote:
>>> > > > > > >
>>> > > > > > > > Regarding documentation update: Initial PR is
>>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up
>>> to section
>>> > > > > ~4.3.
>>> > > > > > > > JIRA link for Programing Guide changes:
>>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>>> > > > > > > >
>>> > > > > > > >
>>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com>
>>> wrote:
>>> > > > > > > > > Yup!
>>> > > > > > > > >
>>> > > > > > > > > My immediate plan is to work on incorporating the Go SDK
>>> fully
>>> > > > > into the
>>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>>> > > > > > > > > am beginning to add missing content and filling in the
>>> Go specific
>>> > > > > gaps.
>>> > > > > > > > > This will be tied to improving the Go Doc with more Go
>>> > > > > > > > > specific user documentation that isn't appropriate for
>>> the BPG.
>>> > > > > > > > >
>>> > > > > > > > > My audit of the guide is here:
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > >
>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>> > > > > > > > >
>>> > > > > > > > > The other sheets focus on features and tests. The
>>> feature page
>>> > > > > looks
>>> > > > > > > > worse
>>> > > > > > > > > than it is, as it was more productive to focus on what
>>> isn't
>>> > > > > available
>>> > > > > > > > than
>>> > > > > > > > > what is. That's a snapshot of my actual working sheet
>>> but I'll be
>>> > > > > > > > updating
>>> > > > > > > > > it as needed.
>>> > > > > > > > >
>>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>>> iemejia@gmail.com>
>>> > > > > wrote:
>>> > > > > > > > >
>>> > > > > > > > > > Oups forgot to write one question. Will this come with
>>> revamped
>>> > > > > > > > > > website instructions/doc for golang too?
>>> > > > > > > > > >
>>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>>> iemejia@gmail.com>
>>> > > > > > > > wrote:
>>> > > > > > > > > > >
>>> > > > > > > > > > > Huge +1
>>> > > > > > > > > > >
>>> > > > > > > > > > > This is definitely something many people have asked
>>> about, so
>>> > > > > it is
>>> > > > > > > > > > > great to see it finally happening.
>>> > > > > > > > > > >
>>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
>>> > > > > kenn@apache.org>
>>> > > > > > > > wrote:
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > +1 awesome
>>> > > > > > > > > > > >
>>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
>>> > > > > lostluck@apache.org
>>> > > > > > > > >
>>> > > > > > > > > > wrote:
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to
>>> get those (Go
>>> > > > > > > > modules
>>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>>> certainly
>>> > > > > before the
>>> > > > > > > > 2.33
>>> > > > > > > > > > cut if release images aren't added to the 2.32 process.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>>> future, we may
>>> > > > > want a
>>> > > > > > > > > > harder break between a newer Generic first API and and
>>> the
>>> > > > > current
>>> > > > > > > > version,
>>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go
>>> aren't
>>> > > > > identical to
>>> > > > > > > > the
>>> > > > > > > > > > feature referred to by that term in Java, C++, Rust,
>>> etc, so
>>> > > > > it'll
>>> > > > > > > > take a
>>> > > > > > > > > > bit of time for that expertise to develop.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> However, by the current nature of Go, we had to
>>> have pretty
>>> > > > > > > > > > sophisticated reflective analysis to handle DoFns and
>>> map them
>>> > > > > to their
>>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>>> emitter, and
>>> > > > > Iterator
>>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
>>> internals to
>>> > > > > use
>>> > > > > > > > > > generics (like the implementation of Stats DoFns like
>>> Min, Max,
>>> > > > > etc)
>>> > > > > > > > would
>>> > > > > > > > > > also be able to be made transparently to most users,
>>> and
>>> > > > > certainly any
>>> > > > > > > > of
>>> > > > > > > > > > the framework for execution time handling (the
>>> "worker's SDK
>>> > > > > harness")
>>> > > > > > > > > > would be able to be cleaned up if need be. Finally,
>>> adding more
>>> > > > > > > > > > sophisticated DoFn registration and code generation
>>> would be
>>> > > > > able to
>>> > > > > > > > > > replace the optional code generator entirely, saving
>>> some users
>>> > > > > a `go
>>> > > > > > > > > > generate` step, simplifying getting improved execution
>>> > > > > performance.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Changing things like making a Type Parameterized
>>> > > > > PCollection,
>>> > > > > > > > would
>>> > > > > > > > > > be far more involved, as would trying to use some kind
>>> of Apply
>>> > > > > > > > format. The
>>> > > > > > > > > > lack of Method Overrides prevents the apply chaining
>>> approach.
>>> > > > > Or at
>>> > > > > > > > least
>>> > > > > > > > > > prevents it from working simply.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Finally, Go Generics won't be available until Go
>>> 1.18,
>>> > > > > which isn't
>>> > > > > > > > > > until next year. See
>>> https://blog.golang.org/generics-proposal
>>> > > > > for
>>> > > > > > > > > > details.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does
>>> include a
>>> > > > > Register
>>> > > > > > > > > > calling convention, leading to a modest performance
>>> improvement
>>> > > > > across
>>> > > > > > > > the
>>> > > > > > > > > > board.
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> Cheers,
>>> > > > > > > > > > > >> Robert Burke
>>> > > > > > > > > > > >>
>>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>>> > > > > robertwb@google.com>
>>> > > > > > > > wrote:
>>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>>> experimental once
>>> > > > > the Go
>>> > > > > > > > > > Modules
>>> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs
>>> to support
>>> > > > > every
>>> > > > > > > > > > feature
>>> > > > > > > > > > > >> > to be accepted, especially now that we can do
>>> > > > > cross-language
>>> > > > > > > > > > > >> > transforms, and Go definitely supports enough
>>> to be quite
>>> > > > > > > > useful.
>>> > > > > > > > > > (WRT
>>> > > > > > > > > > > >> > streaming, my understanding is that Go supports
>>> the
>>> > > > > streaming
>>> > > > > > > > model
>>> > > > > > > > > > > >> > with windows and timestamps, and runs fine on a
>>> streaming
>>> > > > > > > > runner,
>>> > > > > > > > > > even
>>> > > > > > > > > > > >> > if more advanced features like state and timers
>>> aren't yet
>>> > > > > > > > > > available.)
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > This is a great milestone.
>>> > > > > > > > > > > >> >
>>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton
>>> <
>>> > > > > > > > tysonjh@google.com>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > > > >> > >
>>> > > > > > > > > > > >> > > WOW! Big news.
>>> > > > > > > > > > > >> > >
>>> > > > > > > > > > > >> > > I'm supportive of leaving experimental status
>>> after Go
>>> > > > > Modules
>>> > > > > > > > > > are completed and the LICENSE issue is resolved. I
>>> don't think
>>> > > > > that
>>> > > > > > > > lacking
>>> > > > > > > > > > streaming support is a blocker. The other thing I
>>> checked to see
>>> > > > > was if
>>> > > > > > > > > > there were metrics available on
>>> metrics.beam.apache.org,
>>> > > > > specifically
>>> > > > > > > > for
>>> > > > > > > > > > measuring code health via post-commit over time, which
>>> there are
>>> > > > > and
>>> > > > > > > > the
>>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing that
>>> > > > > surprised me
>>> > > > > > > > from
>>> > > > > > > > > > your summary is that when Go introduces generics it
>>> won't result
>>> > > > > in any
>>> > > > > > > > > > backwards incompatible changes in Apache Beam. That's
>>> great
>>> > > > > news, but
>>> > > > > > > > does
>>> > > > > > > > > > it mean there will be a need to support both
>>> non-generic and
>>> > > > > generic
>>> > > > > > > > APIs
>>> > > > > > > > > > moving forward? It seems like generics will be
>>> introduced in the
>>> > > > > Go
>>> > > > > > > > 1.17
>>> > > > > > > > > > release (optimistically) in August this year.
>>> > > > > > > > > > > >> > >
>>> > > > > > > > > > > >> > >
>>> > > > > > > > > > > >> > >
>>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
>>> > > > > > > > lostluck@apache.org>
>>> > > > > > > > > > wrote:
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Hello Beam Community!
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam Go
>>> SDK
>>> > > > > > > > experimental.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> This thread is to discuss it as a community,
>>> and any
>>> > > > > > > > conditions
>>> > > > > > > > > > that remain that would prevent the exit.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> tl;dr;
>>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I have
>>> both.
>>> > > > > > > > > > > >> > >> This entails including it officially in the
>>> Release
>>> > > > > process,
>>> > > > > > > > > > removing the various "experimental" text throughout
>>> the repo etc,
>>> > > > > > > > > > > >> > >> and otherwise treating it like Python and
>>> Java. Some Go
>>> > > > > > > > specific
>>> > > > > > > > > > tasks around dep versioning.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>>> efficiently for
>>> > > > > most
>>> > > > > > > > batch
>>> > > > > > > > > > tasks, including basic windowing.
>>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
>>> tested on all
>>> > > > > > > > Portable
>>> > > > > > > > > > runners.
>>> > > > > > > > > > > >> > >> The core APIs are not going to change in
>>> incompatible
>>> > > > > ways
>>> > > > > > > > going
>>> > > > > > > > > > forward.
>>> > > > > > > > > > > >> > >> Scalable transforms can be written through
>>> > > > > SplittableDoFns or
>>> > > > > > > > > > via Cross Language transforms.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
>>> keeping it
>>> > > > > > > > experimental
>>> > > > > > > > > > doesn't help with that any further.
>>> > > > > > > > > > > >> > >> Communities grow through contributions and
>>> use, and
>>> > > > > > > > experimental
>>> > > > > > > > > > markers dissuade users.
>>> > > > > > > > > > > >> > >> There's plenty to do in order expand what
>>> can be done
>>> > > > > with
>>> > > > > > > > the
>>> > > > > > > > > > SDK. (Contributions welcome)
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>>> Experimental, it's
>>> > > > > > > > because
>>> > > > > > > > > > there's a risk that API or behaviors may change
>>> significantly.
>>> > > > > > > > > > > >> > >> This in turn, leads to additional work for
>>> users of
>>> > > > > the SDK
>>> > > > > > > > on
>>> > > > > > > > > > every release which leads to sticking to older
>>> versions or
>>> > > > > forking
>>> > > > > > > > > > > >> > >> to preserve behavior. Version updates should
>>> be looked
>>> > > > > > > > forward
>>> > > > > > > > > > to, and viewed as having little risk. Further while
>>> there's been
>>> > > > > > > > > > > >> > >> previous dicussion about what the "low bar"
>>> is for a
>>> > > > > new
>>> > > > > > > > SDK, it
>>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel
>>> this has
>>> > > > > > > > > > > >> > >> hurt development and contribution of new SDK
>>> languages
>>> > > > > > > > (inherent
>>> > > > > > > > > > difficulty of SDK development notwithstanding).
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't
>>> entirely clear
>>> > > > > what the
>>> > > > > > > > > > Beam Model should look like in an opinionated language
>>> like Go.
>>> > > > > > > > > > > >> > >> Their initial take (see
>>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes
>>> into
>>> > > > > detail
>>> > > > > > > > what it
>>> > > > > > > > > > means for a language without
>>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance to
>>> implement
>>> > > > > the
>>> > > > > > > > beam
>>> > > > > > > > > > model. One could largely throw away static types (like
>>> Python),
>>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It
>>> would not do
>>> > > > > if the
>>> > > > > > > > > > approach couldn't grow and scale to the Beam Model.
>>> It's also
>>> > > > > hard
>>> > > > > > > > > > > >> > >> to tell if an API is any good before there
>>> are users.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Further, in the early days of Portability,
>>> there
>>> > > > > wasn't a
>>> > > > > > > > way to
>>> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's an
>>> > > > > incredible
>>> > > > > > > > > > > >> > >> bottleneck to need to do all initial fanout
>>> of work on
>>> > > > > a
>>> > > > > > > > single
>>> > > > > > > > > > machine, write everything to a Reshuffle, just in
>>> order to scale
>>> > > > > up.
>>> > > > > > > > > > > >> > >> Without being able to scale, Beam is little
>>> more than
>>> > > > > > > > overhead.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> At this point, both of these needs are met
>>> within the
>>> > > > > Go SDK
>>> > > > > > > > for
>>> > > > > > > > > > open source.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Background
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam repo
>>> for a few
>>> > > > > years
>>> > > > > > > > now,
>>> > > > > > > > > > since it was accidentally merged into master.
>>> > > > > > > > > > > >> > >> Since then it's been called experimental,
>>> and not
>>> > > > > officially
>>> > > > > > > > > > part of the releases.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed around
>>> Beam
>>> > > > > Portability
>>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner
>>> specific )
>>> > > > > workers.
>>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos
>>> and FnAPI to
>>> > > > > > > > execute
>>> > > > > > > > > > jobs, first with some very experimental code on
>>> Dataflow, but now
>>> > > > > > > > > > > >> > >> on all portable supported runners, like
>>> Flink, Spark,
>>> > > > > the
>>> > > > > > > > Python
>>> > > > > > > > > > Portable runner, and Dataflow.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> API Stability
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's
>>> user API
>>> > > > > for DoFn
>>> > > > > > > > > > and pipeline construction since it was first merged
>>> in, and
>>> > > > > there are
>>> > > > > > > > no
>>> > > > > > > > > > > >> > >> changes to that on the horizon that can't be
>>> made in a
>>> > > > > > > > backwards
>>> > > > > > > > > > compatible manner. Largely these are related to New
>>> Features, or
>>> > > > > > > > > > > >> > >> usability improvements enabled by the advent
>>> of Go
>>> > > > > Generics
>>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>>> largely been
>>> > > > > under
>>> > > > > > > > work
>>> > > > > > > > > > for use within Google. It's use is called FlumeGo,
>>> representing
>>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of
>>> Flume,
>>> > > > > Google's
>>> > > > > > > > batch
>>> > > > > > > > > > pipeline processing engine. Thus most of the focus on
>>> improving
>>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>>> today, and
>>> > > > > there
>>> > > > > > > > hasn't
>>> > > > > > > > > > been a call for fundamental changes to the API for
>>> ergonomic or
>>> > > > > > > > > > > >> > >> usability concerns.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Scalability
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Google could get away without the Go SDK
>>> having an SDK
>>> > > > > side
>>> > > > > > > > > > scalability solution as a result of it's integration
>>> with Flume.
>>> > > > > > > > > > > >> > >> However, those days are now past.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns
>>> along with
>>> > > > > Dynamic
>>> > > > > > > > > > Splitting, which supports writing scalable batch
>>> transforms
>>> > > > > natively
>>> > > > > > > > > > > >> > >> in the Go SDK.
>>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>>> Transforms, with
>>> > > > > Beam
>>> > > > > > > > > > Schema encodings. With it, production hardened
>>> transforms
>>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented
>>> the SDF
>>> > > > > side
>>> > > > > > > > work,
>>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper for
>>> the
>>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>>> Transforms, which
>>> > > > > is often
>>> > > > > > > > > > been requested. This will also enable use of the Beam
>>> SQL
>>> > > > > > > > > > > >> > >> transforms that java enables.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Features
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The
>>> Go SDK
>>> > > > > implements
>>> > > > > > > > > > standard coders, allows for user DoFns, and CombineFns
>>> and access
>>> > > > > > > > > > > >> > >> to core transforms like Flatten, GroupByKey,
>>> and
>>> > > > > features
>>> > > > > > > > like
>>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>>> > > > > > > > > > > >> > >> Basic windowing will be fully supported for
>>> batch even
>>> > > > > > > > through
>>> > > > > > > > > > lifted combines in the 2.32.0 release.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>>> versatile for
>>> > > > > batch
>>> > > > > > > > > > execution on portable runners, and for simple streaming
>>> > > > > pipelines.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Repo Testing
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit
>>> tests. On
>>> > > > > top of
>>> > > > > > > > > > that, it runs all it's integration tests against the
>>> Python
>>> > > > > Portable
>>> > > > > > > > runner,
>>> > > > > > > > > > > >> > >> making it quick and robust to detect
>>> breaking changes
>>> > > > > without
>>> > > > > > > > > > overspending community resources. Those same tests are
>>> also
>>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The tests are executable against all runners
>>> via the
>>> > > > > > > > appropriate
>>> > > > > > > > > > Go commands (if you've stood up your own job
>>> management server),
>>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up runner
>>> > > > > instances for
>>> > > > > > > > > > you). Documentation for executing tests and adding new
>>> ones
>>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
>>> > > > > developers as
>>> > > > > > > > > > they're implemented with the standard Go testing tools.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Shortcomings
>>> > > > > > > > > > > >> > >> That said, there's still much to do. Let me
>>> briefly
>>> > > > > tell you
>>> > > > > > > > > > what doesn't work, and it's up to you to weigh whether
>>> they block
>>> > > > > > > > > > > >> > >> being out of experimental.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> At present, only a textio has been
>>> implemented as
>>> > > > > Splittable
>>> > > > > > > > > > DoFn.
>>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will
>>> serve as
>>> > > > > a the
>>> > > > > > > > > > first example for future contributions for
>>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at this
>>> point
>>> > > > > users are
>>> > > > > > > > > > empowered to write their own DoFns or wrap existing
>>> transforms
>>> > > > > for
>>> > > > > > > > Cross
>>> > > > > > > > > > Language use.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>>> features have
>>> > > > > yet to
>>> > > > > > > > be
>>> > > > > > > > > > implemented, but they're largely additions to what
>>> exists already
>>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the work
>>> is
>>> > > > > definining
>>> > > > > > > > how a
>>> > > > > > > > > > user specifies their desires, and turning those into
>>> the
>>> > > > > appropriate
>>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in
>>> October I
>>> > > > > wrote at
>>> > > > > > > > > > length on the wiki [1] what's missing for additional
>>> streaming
>>> > > > > > > > features.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> While we have bolstered our testing
>>> recently, there's
>>> > > > > likely
>>> > > > > > > > > > still more we could test to improve our confidence in
>>> the SDK,
>>> > > > > > > > > > > >> > >> in particular regarding the included
>>> transforms
>>> > > > > libraries and
>>> > > > > > > > > > examples.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Moving Forward
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> My immediate plan is to work on
>>> incorporating the Go
>>> > > > > SDK
>>> > > > > > > > fully
>>> > > > > > > > > > into the Beam Programming Guide. I've audited the
>>> guide [3], and
>>> > > > > > > > > > > >> > >> am beginning to add missing content and
>>> filling in the
>>> > > > > Go
>>> > > > > > > > > > specific gaps. This will be tied to improving the Go
>>> Doc with
>>> > > > > more Go
>>> > > > > > > > > > > >> > >> specific user documentation that isn't
>>> appropriate for
>>> > > > > the
>>> > > > > > > > BPG.
>>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the
>>> public
>>> > > > > display of
>>> > > > > > > > > > that GoDoc.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding
>>> vote, I will
>>> > > > > > > > > > incorporate the SDK into the release process, and
>>> remove the
>>> > > > > > > > "experimental"
>>> > > > > > > > > > > >> > >> language around the SDK. This largely
>>> entails updating
>>> > > > > the
>>> > > > > > > > > > release scripts to also build and publish the Go SDK
>>> Docker
>>> > > > > containers.
>>> > > > > > > > > > > >> > >> As for releasing the code, we're technically
>>> already
>>> > > > > doing so
>>> > > > > > > > > > whenever we tag a release branch [4].
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>>> however will be
>>> > > > > > > > > > migrating the SDK to use Go Modules for dependency
>>> version
>>> > > > > control,
>>> > > > > > > > > > > >> > >> which Daniel is planning on working on after
>>> his Kafka
>>> > > > > task.
>>> > > > > > > > > > This will put our repo infrastructure, SDK
>>> contributors, and
>>> > > > > users
>>> > > > > > > > > > > >> > >> on the same footing when it comes to
>>> dependency
>>> > > > > management.
>>> > > > > > > > It
>>> > > > > > > > > > will remove the "+incompatible" tags one sees on the
>>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> I'm very happy to answer any questions you
>>> might have
>>> > > > > about
>>> > > > > > > > the
>>> > > > > > > > > > SDK, and provide additional links as needed. I
>>> intentionally
>>> > > > > avoided
>>> > > > > > > > > > > >> > >> a link barrage in this email, as they can
>>> distract
>>> > > > > from the
>>> > > > > > > > > > point: The SDK is ready for folks to use it, we need
>>> to tell
>>> > > > > them that
>>> > > > > > > > they
>>> > > > > > > > > > can
>>> > > > > > > > > > > >> > >> rather than they shouldn't.
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> Robert Burke
>>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>>> > > > > > > > > > > >> > >>
>>> > > > > > > > > > > >> > >> [0]
>>> https://s.apache.org/beam-go-sdk-design-rfc
>>> > > > > > > > > > > >> > >> [1]
>>> > > > > > > > > >
>>> > > > > > > >
>>> > > > >
>>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>>> > > > > > > > > > > >> > >> [2]
>>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>>> > > > > > > > > > > >> > >> [3]
>>> > > > > > > > > >
>>> > > > > > > >
>>> > > > >
>>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>>> > > > > > > > > > (SDK Audit sheet)
>>> > > > > > > > > > > >> > >> [4]
>>> > > > > > > > > >
>>> > > > > > > >
>>> > > > >
>>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>>> > > > > > > > > > > >> >
>>> > > > > > > > > >
>>> > > > > > > > >
>>> > > > > > > >
>>> > > > > > >
>>> > > > > >
>>> > > > >
>>> > > >
>>> > >
>>>
>>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <ro...@frantil.com>.
Investigation yielded that there's no way around the prefixed tags. The
JIRA has been commented with the explanation.

https://github.com/apache/beam/pull/15881 has the release script updates.

I'm working on the Exit blogpost and the updated Go SDK roadmap. The draft
PR will be linked here.

Since 2.34.0 is almost out (assuming RC1 verification goes well) I'm
inclined to wait for that release to finish before publishing the blogpost.
I'll link the draft PR here as soon as it's ready.

Once 2.34.0 is released, I'm inclined to still have 2.33.0 be also prefix
tagged so there isn't a gap in versions between the unmoduled code and
moduled code.

Once published,  that'll be the end of this thread.

Thank you very much everyone.

Robert Burke
Beam Go Busybody

On Tue, Oct 26, 2021, 5:36 PM Kyle Weaver <kc...@google.com> wrote:

> +1 to extra tags. They'll be trivial to add to our release process, and
> git tags are lightweight by design so I don't foresee any problems.
>
> On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com>
> wrote:
>
>> Glad you were able to figure it out. The extra tags are certainly
>> worth making this work if it's what we have to do, and shouldn't be
>> too much of a problem (until, hopefully, it's fixed on the go side).
>>
>> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org> wrote:
>> >
>> > With Kyle's help with the additional tagging of the next RC, we have
>> validated that this is the currently correct approach.
>> >
>> >
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
>> >
>> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>> >
>> > Or even:
>> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam  (links
>> to latest tagged version)
>> >
>> > The main cost to this approach is doubling the number of tags in the
>> tags list: https://github.com/apache/beam/tags which is not ideal, but
>> overall a small cost. There's no need for "full publish" of these
>> additional tags, so we won't be doubling our "releases" (see
>> https://github.com/apache/beam/releases).
>> >
>> > I'll still be filing a bug against the Go commands since the mandatory
>> prefixing is unintuitive, and seems unnecessary. If it becomes so, we can
>> always delete the tags from the affected branches, and cease the behavior
>> going forward. I'll search through the existing Go issues first however to
>> see if this has been previously discussed, and report my findings here
>> either way.
>> >
>> > This does require 2 small changes to release guide: The rc tagging
>> script, and the finally tagging:
>> >
>> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>> >
>> >
>> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>> >
>> > I'll make this change later this week (or early next) assuming there
>> are no objections.
>> >
>> > Thank you all very much for your patience,
>> > Robert Burke
>> > Beam Go Busybody
>> >
>> >
>> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
>> > > With much research in reading the Go Modules documentation, I have
>> confirmed what the issue is.
>> > >
>> > > We added the go.mod file to sdks/ under the repo root because it's a
>> cleaner spot for the change, captures the Java and Python container boot
>> code (written in Go) into the module and avoids conflicts in
>> interpretations of the vendor directory that lives at the root level.
>> > >
>> > > However, we missed that when doing so, the standard version tags
>> would only apply to modules at the root level, not at modules in
>> subdirectories. See https://golang.org/ref/mod#vcs-version, but quoting
>> the important paragraph:
>> > >
>> > > > If a module is defined in a subdirectory within the repository,
>> that is, the module subdirectory portion of
>> > > > the module path is not empty, then each tag name must be prefixed
>> with the module subdirectory,
>> > > > followed by a slash. For example, the module
>> golang.org/x/tools/gopls is defined in the gopls
>> > > > subdirectory of the repository with root path golang.org/x/tools.
>> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in
>> that repository.
>> > >
>> > > Specifically, for the Go SDK to be able to be fetched at the right
>> version, we need to have prefixed tags like "sdks/v2.33.0" or
>> "sdks/v2.34.0-RC1"
>> > >
>> > > So, the fix for the Go versioning issue is to amend our Release
>> process (including generating Release Candidate builds) to also add a
>> prefixed version tag with the same version.
>> > >
>> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there
>> are no objections we can back update the 2.33.0 release branch with such a
>> prefixed tag. At which point I can also write the Official Experiemental
>> Exit Blog post.
>> > >
>> > > Thank you all for your patience.
>> > > Robert Burke
>> > >
>> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
>> > > > Thank you for the detailed update! Let us know if we can help.
>> > > >
>> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org>
>> wrote:
>> > > >
>> > > > > This is a status update.
>> > > > >
>> > > > > At this point 2.33.0 is released, but there are difficulties with
>> > > > > accessing the tagged versions using the standard go tools. It's
>> currently
>> > > > > under investigation.
>> > > > >
>> > > > > Using the v2 path in a go program then running `go mod tidy` will
>> populate
>> > > > > the file with  a pseudo-version rather than the latest tag
>> (v2.33.0)  (eg
>> > > > > the line looks like
>> > > > > require github.com/apache/beam/sdks/v2
>> v2.0.0-20211013181004-a9120e083008
>> > > > > )
>> > > > >
>> > > > > While this will work, it's not the desired experience for users
>> at this
>> > > > > point. Current downside is that the releases are not meaningful
>> targets for
>> > > > > some reason. However, we retain the other benefits of Go Modules
>> (actual
>> > > > > dependency versioning, management by go tools).
>> > > > >
>> > > > > The issue is some combination of the go tooling [A] , that we
>> added a go
>> > > > > mod file outside of the repo root [B], and that we did not
>> increment the
>> > > > > major version (v2 -> v3) when adding the go mod file [C].
>> > > > >
>> > > > > [B] From the go documentation, this should be legal and fine,
>> even if it's
>> > > > > not recommended. This is fortunate because the root of the repo
>> would have
>> > > > > played poorly with root vendor directory, which the go tools have
>> opinions
>> > > > > on.
>> > > > >
>> > > > > [C] Incrementing the major version is recommended,in the Go
>> Modules
>> > > > > documentation, when transitioning to Go Modules. However, it
>> never said it
>> > > > > was required, nor did it indicate this current failure mode. If
>> anything
>> > > > > this should be documented in those docs, if it's not another bug.
>> We would
>> > > > > not necessarily want to declare a global v3 for beam at this
>> time, for just
>> > > > > the Go SDK, it would become confusing rather quickly. Notionally
>> there are
>> > > > > some larger breaking changes the Java and Python SDKs would want
>> to make in
>> > > > > such an event, and thus it's a larger conversation, that is out
>> of scope at
>> > > > > this time.
>> > > > >
>> > > > > This leaves [A] where some mis-understanding of the documented
>> semantics
>> > > > > occurred. I certainly expected the tagged version of the non-root
>> go-module
>> > > > > to be inherited from the parent, not wholesale ignored. As a
>> result, I'll
>> > > > > be filing a bug against the go tools to determine this, and see
>> what paths
>> > > > > forward exist.
>> > > > >
>> > > > > It's my hope to resolve this before we write a properly
>> Experimental Exit
>> > > > > blog post for the Go SDK.
>> > > > >
>> > > > > Thank you for your patience, and time.
>> > > > > Robert Burke
>> > > > > Beam Go Busybody
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote:
>> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK now
>> uses Go
>> > > > > Modules for dependency management, simplifying Go SDK
>> contributions. [2]
>> > > > > >
>> > > > > > The Module file lives in the sdks/ directory so there's a
>> single Go
>> > > > > Module for the whole SDK, tests, examples, and any support code
>> for the
>> > > > > container boot builds. This excludes the Go SDK Code katas [3] go
>> modules
>> > > > > which can be updated once 2.33.0 has been released.
>> > > > > >
>> > > > > > PR 15365 [4] adds the SDK containers back to the release
>> builds, and
>> > > > > default uses the release specific container for docker execution
>> jobs. For
>> > > > > at least the 2.33.0 release this does mean that  manual
>> validation will
>> > > > > need to explictly specify RC versions of containers. However,
>> given that
>> > > > > the Go SDK container and worker boot process rarely changes, this
>> is
>> > > > > unlikely to be an issue.
>> > > > > >
>> > > > > > At present I'm cleaning up some of the references to
>> experimental, and
>> > > > > making it clear that 2.33.0 is the first non-experimental release
>> (even
>> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md
>> will be
>> > > > > updated to note the event, but a larger blogpost will happen
>> after the
>> > > > > release goes public.
>> > > > > >
>> > > > > > Cheers,
>> > > > > > Robert Burke
>> > > > > > Defacto Beam Go TL.
>> > > > > >
>> > > > > > [1]
>> > > > >
>> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
>> > > > > > [2] https://github.com/apache/beam/pull/15323
>> > > > > > [3]
>> https://github.com/apache/beam/tree/master/learning/katas/go
>> > > > > > [4] https://github.com/apache/beam/pull/15365
>> > > > > >
>> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
>> > > > > > > +1, congratulations & thank you!
>> > > > > > >
>> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
>> lostluck@apache.org>
>> > > > > wrote:
>> > > > > > >
>> > > > > > > > Regarding documentation update: Initial PR is
>> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up to
>> section
>> > > > > ~4.3.
>> > > > > > > > JIRA link for Programing Guide changes:
>> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
>> > > > > > > >
>> > > > > > > >
>> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com>
>> wrote:
>> > > > > > > > > Yup!
>> > > > > > > > >
>> > > > > > > > > My immediate plan is to work on incorporating the Go SDK
>> fully
>> > > > > into the
>> > > > > > > > > Beam Programming Guide. I've audited the guide, and
>> > > > > > > > > am beginning to add missing content and filling in the Go
>> specific
>> > > > > gaps.
>> > > > > > > > > This will be tied to improving the Go Doc with more Go
>> > > > > > > > > specific user documentation that isn't appropriate for
>> the BPG.
>> > > > > > > > >
>> > > > > > > > > My audit of the guide is here:
>> > > > > > > > >
>> > > > > > > >
>> > > > >
>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>> > > > > > > > >
>> > > > > > > > > The other sheets focus on features and tests. The feature
>> page
>> > > > > looks
>> > > > > > > > worse
>> > > > > > > > > than it is, as it was more productive to focus on what
>> isn't
>> > > > > available
>> > > > > > > > than
>> > > > > > > > > what is. That's a snapshot of my actual working sheet but
>> I'll be
>> > > > > > > > updating
>> > > > > > > > > it as needed.
>> > > > > > > > >
>> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
>> iemejia@gmail.com>
>> > > > > wrote:
>> > > > > > > > >
>> > > > > > > > > > Oups forgot to write one question. Will this come with
>> revamped
>> > > > > > > > > > website instructions/doc for golang too?
>> > > > > > > > > >
>> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
>> iemejia@gmail.com>
>> > > > > > > > wrote:
>> > > > > > > > > > >
>> > > > > > > > > > > Huge +1
>> > > > > > > > > > >
>> > > > > > > > > > > This is definitely something many people have asked
>> about, so
>> > > > > it is
>> > > > > > > > > > > great to see it finally happening.
>> > > > > > > > > > >
>> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
>> > > > > kenn@apache.org>
>> > > > > > > > wrote:
>> > > > > > > > > > > >
>> > > > > > > > > > > > +1 awesome
>> > > > > > > > > > > >
>> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
>> > > > > lostluck@apache.org
>> > > > > > > > >
>> > > > > > > > > > wrote:
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to get
>> those (Go
>> > > > > > > > modules
>> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
>> certainly
>> > > > > before the
>> > > > > > > > 2.33
>> > > > > > > > > > cut if release images aren't added to the 2.32 process.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Regarding Go Generics: at some point in the
>> future, we may
>> > > > > want a
>> > > > > > > > > > harder break between a newer Generic first API and and
>> the
>> > > > > current
>> > > > > > > > version,
>> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go
>> aren't
>> > > > > identical to
>> > > > > > > > the
>> > > > > > > > > > feature referred to by that term in Java, C++, Rust,
>> etc, so
>> > > > > it'll
>> > > > > > > > take a
>> > > > > > > > > > bit of time for that expertise to develop.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> However, by the current nature of Go, we had to
>> have pretty
>> > > > > > > > > > sophisticated reflective analysis to handle DoFns and
>> map them
>> > > > > to their
>> > > > > > > > > > graph inputs. So, adding new helpers like a KV,
>> emitter, and
>> > > > > Iterator
>> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
>> internals to
>> > > > > use
>> > > > > > > > > > generics (like the implementation of Stats DoFns like
>> Min, Max,
>> > > > > etc)
>> > > > > > > > would
>> > > > > > > > > > also be able to be made transparently to most users, and
>> > > > > certainly any
>> > > > > > > > of
>> > > > > > > > > > the framework for execution time handling (the
>> "worker's SDK
>> > > > > harness")
>> > > > > > > > > > would be able to be cleaned up if need be. Finally,
>> adding more
>> > > > > > > > > > sophisticated DoFn registration and code generation
>> would be
>> > > > > able to
>> > > > > > > > > > replace the optional code generator entirely, saving
>> some users
>> > > > > a `go
>> > > > > > > > > > generate` step, simplifying getting improved execution
>> > > > > performance.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Changing things like making a Type Parameterized
>> > > > > PCollection,
>> > > > > > > > would
>> > > > > > > > > > be far more involved, as would trying to use some kind
>> of Apply
>> > > > > > > > format. The
>> > > > > > > > > > lack of Method Overrides prevents the apply chaining
>> approach.
>> > > > > Or at
>> > > > > > > > least
>> > > > > > > > > > prevents it from working simply.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Finally, Go Generics won't be available until Go
>> 1.18,
>> > > > > which isn't
>> > > > > > > > > > until next year. See
>> https://blog.golang.org/generics-proposal
>> > > > > for
>> > > > > > > > > > details.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does
>> include a
>> > > > > Register
>> > > > > > > > > > calling convention, leading to a modest performance
>> improvement
>> > > > > across
>> > > > > > > > the
>> > > > > > > > > > board.
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> Cheers,
>> > > > > > > > > > > >> Robert Burke
>> > > > > > > > > > > >>
>> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
>> > > > > robertwb@google.com>
>> > > > > > > > wrote:
>> > > > > > > > > > > >> > +1 to declaring Golang support out of
>> experimental once
>> > > > > the Go
>> > > > > > > > > > Modules
>> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs to
>> support
>> > > > > every
>> > > > > > > > > > feature
>> > > > > > > > > > > >> > to be accepted, especially now that we can do
>> > > > > cross-language
>> > > > > > > > > > > >> > transforms, and Go definitely supports enough to
>> be quite
>> > > > > > > > useful.
>> > > > > > > > > > (WRT
>> > > > > > > > > > > >> > streaming, my understanding is that Go supports
>> the
>> > > > > streaming
>> > > > > > > > model
>> > > > > > > > > > > >> > with windows and timestamps, and runs fine on a
>> streaming
>> > > > > > > > runner,
>> > > > > > > > > > even
>> > > > > > > > > > > >> > if more advanced features like state and timers
>> aren't yet
>> > > > > > > > > > available.)
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > This is a great milestone.
>> > > > > > > > > > > >> >
>> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
>> > > > > > > > tysonjh@google.com>
>> > > > > > > > > > wrote:
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > WOW! Big news.
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > I'm supportive of leaving experimental status
>> after Go
>> > > > > Modules
>> > > > > > > > > > are completed and the LICENSE issue is resolved. I
>> don't think
>> > > > > that
>> > > > > > > > lacking
>> > > > > > > > > > streaming support is a blocker. The other thing I
>> checked to see
>> > > > > was if
>> > > > > > > > > > there were metrics available on metrics.beam.apache.org
>> ,
>> > > > > specifically
>> > > > > > > > for
>> > > > > > > > > > measuring code health via post-commit over time, which
>> there are
>> > > > > and
>> > > > > > > > the
>> > > > > > > > > > passing test rate is high (Huzzah!). The one thing that
>> > > > > surprised me
>> > > > > > > > from
>> > > > > > > > > > your summary is that when Go introduces generics it
>> won't result
>> > > > > in any
>> > > > > > > > > > backwards incompatible changes in Apache Beam. That's
>> great
>> > > > > news, but
>> > > > > > > > does
>> > > > > > > > > > it mean there will be a need to support both
>> non-generic and
>> > > > > generic
>> > > > > > > > APIs
>> > > > > > > > > > moving forward? It seems like generics will be
>> introduced in the
>> > > > > Go
>> > > > > > > > 1.17
>> > > > > > > > > > release (optimistically) in August this year.
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > >
>> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
>> > > > > > > > lostluck@apache.org>
>> > > > > > > > > > wrote:
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Hello Beam Community!
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam Go
>> SDK
>> > > > > > > > experimental.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> This thread is to discuss it as a community,
>> and any
>> > > > > > > > conditions
>> > > > > > > > > > that remain that would prevent the exit.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> tl;dr;
>> > > > > > > > > > > >> > >> Ask Questions for answers and links! I have
>> both.
>> > > > > > > > > > > >> > >> This entails including it officially in the
>> Release
>> > > > > process,
>> > > > > > > > > > removing the various "experimental" text throughout the
>> repo etc,
>> > > > > > > > > > > >> > >> and otherwise treating it like Python and
>> Java. Some Go
>> > > > > > > > specific
>> > > > > > > > > > tasks around dep versioning.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The Go SDK implements the beam model
>> efficiently for
>> > > > > most
>> > > > > > > > batch
>> > > > > > > > > > tasks, including basic windowing.
>> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
>> tested on all
>> > > > > > > > Portable
>> > > > > > > > > > runners.
>> > > > > > > > > > > >> > >> The core APIs are not going to change in
>> incompatible
>> > > > > ways
>> > > > > > > > going
>> > > > > > > > > > forward.
>> > > > > > > > > > > >> > >> Scalable transforms can be written through
>> > > > > SplittableDoFns or
>> > > > > > > > > > via Cross Language transforms.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
>> keeping it
>> > > > > > > > experimental
>> > > > > > > > > > doesn't help with that any further.
>> > > > > > > > > > > >> > >> Communities grow through contributions and
>> use, and
>> > > > > > > > experimental
>> > > > > > > > > > markers dissuade users.
>> > > > > > > > > > > >> > >> There's plenty to do in order expand what can
>> be done
>> > > > > with
>> > > > > > > > the
>> > > > > > > > > > SDK. (Contributions welcome)
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Why Exit Experimental now?
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Typically when we call an SDK or API
>> Experimental, it's
>> > > > > > > > because
>> > > > > > > > > > there's a risk that API or behaviors may change
>> significantly.
>> > > > > > > > > > > >> > >> This in turn, leads to additional work for
>> users of
>> > > > > the SDK
>> > > > > > > > on
>> > > > > > > > > > every release which leads to sticking to older versions
>> or
>> > > > > forking
>> > > > > > > > > > > >> > >> to preserve behavior. Version updates should
>> be looked
>> > > > > > > > forward
>> > > > > > > > > > to, and viewed as having little risk. Further while
>> there's been
>> > > > > > > > > > > >> > >> previous dicussion about what the "low bar"
>> is for a
>> > > > > new
>> > > > > > > > SDK, it
>> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel
>> this has
>> > > > > > > > > > > >> > >> hurt development and contribution of new SDK
>> languages
>> > > > > > > > (inherent
>> > > > > > > > > > difficulty of SDK development notwithstanding).
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't entirely
>> clear
>> > > > > what the
>> > > > > > > > > > Beam Model should look like in an opinionated language
>> like Go.
>> > > > > > > > > > > >> > >> Their initial take (see
>> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes
>> into
>> > > > > detail
>> > > > > > > > what it
>> > > > > > > > > > means for a language without
>> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance to
>> implement
>> > > > > the
>> > > > > > > > beam
>> > > > > > > > > > model. One could largely throw away static types (like
>> Python),
>> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It
>> would not do
>> > > > > if the
>> > > > > > > > > > approach couldn't grow and scale to the Beam Model.
>> It's also
>> > > > > hard
>> > > > > > > > > > > >> > >> to tell if an API is any good before there
>> are users.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Further, in the early days of Portability,
>> there
>> > > > > wasn't a
>> > > > > > > > way to
>> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's an
>> > > > > incredible
>> > > > > > > > > > > >> > >> bottleneck to need to do all initial fanout
>> of work on
>> > > > > a
>> > > > > > > > single
>> > > > > > > > > > machine, write everything to a Reshuffle, just in order
>> to scale
>> > > > > up.
>> > > > > > > > > > > >> > >> Without being able to scale, Beam is little
>> more than
>> > > > > > > > overhead.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> At this point, both of these needs are met
>> within the
>> > > > > Go SDK
>> > > > > > > > for
>> > > > > > > > > > open source.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Background
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam repo
>> for a few
>> > > > > years
>> > > > > > > > now,
>> > > > > > > > > > since it was accidentally merged into master.
>> > > > > > > > > > > >> > >> Since then it's been called experimental, and
>> not
>> > > > > officially
>> > > > > > > > > > part of the releases.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed around
>> Beam
>> > > > > Portability
>> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner specific
>> )
>> > > > > workers.
>> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos and
>> FnAPI to
>> > > > > > > > execute
>> > > > > > > > > > jobs, first with some very experimental code on
>> Dataflow, but now
>> > > > > > > > > > > >> > >> on all portable supported runners, like
>> Flink, Spark,
>> > > > > the
>> > > > > > > > Python
>> > > > > > > > > > Portable runner, and Dataflow.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> API Stability
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's
>> user API
>> > > > > for DoFn
>> > > > > > > > > > and pipeline construction since it was first merged in,
>> and
>> > > > > there are
>> > > > > > > > no
>> > > > > > > > > > > >> > >> changes to that on the horizon that can't be
>> made in a
>> > > > > > > > backwards
>> > > > > > > > > > compatible manner. Largely these are related to New
>> Features, or
>> > > > > > > > > > > >> > >> usability improvements enabled by the advent
>> of Go
>> > > > > Generics
>> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
>> largely been
>> > > > > under
>> > > > > > > > work
>> > > > > > > > > > for use within Google. It's use is called FlumeGo,
>> representing
>> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of
>> Flume,
>> > > > > Google's
>> > > > > > > > batch
>> > > > > > > > > > pipeline processing engine. Thus most of the focus on
>> improving
>> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use
>> today, and
>> > > > > there
>> > > > > > > > hasn't
>> > > > > > > > > > been a call for fundamental changes to the API for
>> ergonomic or
>> > > > > > > > > > > >> > >> usability concerns.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Scalability
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Google could get away without the Go SDK
>> having an SDK
>> > > > > side
>> > > > > > > > > > scalability solution as a result of it's integration
>> with Flume.
>> > > > > > > > > > > >> > >> However, those days are now past.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns along
>> with
>> > > > > Dynamic
>> > > > > > > > > > Splitting, which supports writing scalable batch
>> transforms
>> > > > > natively
>> > > > > > > > > > > >> > >> in the Go SDK.
>> > > > > > > > > > > >> > >> The SDK also supports Cross Language
>> Transforms, with
>> > > > > Beam
>> > > > > > > > > > Schema encodings. With it, production hardened
>> transforms
>> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented
>> the SDF
>> > > > > side
>> > > > > > > > work,
>> > > > > > > > > > and completed the Xlang work,) is adding a wrapper for
>> the
>> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language
>> Transforms, which
>> > > > > is often
>> > > > > > > > > > been requested. This will also enable use of the Beam
>> SQL
>> > > > > > > > > > > >> > >> transforms that java enables.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Features
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The Go
>> SDK
>> > > > > implements
>> > > > > > > > > > standard coders, allows for user DoFns, and CombineFns
>> and access
>> > > > > > > > > > > >> > >> to core transforms like Flatten, GroupByKey,
>> and
>> > > > > features
>> > > > > > > > like
>> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
>> > > > > > > > > > > >> > >> Basic windowing will be fully supported for
>> batch even
>> > > > > > > > through
>> > > > > > > > > > lifted combines in the 2.32.0 release.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
>> versatile for
>> > > > > batch
>> > > > > > > > > > execution on portable runners, and for simple streaming
>> > > > > pipelines.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Repo Testing
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit
>> tests. On
>> > > > > top of
>> > > > > > > > > > that, it runs all it's integration tests against the
>> Python
>> > > > > Portable
>> > > > > > > > runner,
>> > > > > > > > > > > >> > >> making it quick and robust to detect breaking
>> changes
>> > > > > without
>> > > > > > > > > > overspending community resources. Those same tests are
>> also
>> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The tests are executable against all runners
>> via the
>> > > > > > > > appropriate
>> > > > > > > > > > Go commands (if you've stood up your own job management
>> server),
>> > > > > > > > > > > >> > >> or Gradle commands (which will spin up runner
>> > > > > instances for
>> > > > > > > > > > you). Documentation for executing tests and adding new
>> ones
>> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
>> > > > > developers as
>> > > > > > > > > > they're implemented with the standard Go testing tools.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Shortcomings
>> > > > > > > > > > > >> > >> That said, there's still much to do. Let me
>> briefly
>> > > > > tell you
>> > > > > > > > > > what doesn't work, and it's up to you to weigh whether
>> they block
>> > > > > > > > > > > >> > >> being out of experimental.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> At present, only a textio has been
>> implemented as
>> > > > > Splittable
>> > > > > > > > > > DoFn.
>> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will
>> serve as
>> > > > > a the
>> > > > > > > > > > first example for future contributions for
>> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
>> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at this
>> point
>> > > > > users are
>> > > > > > > > > > empowered to write their own DoFns or wrap existing
>> transforms
>> > > > > for
>> > > > > > > > Cross
>> > > > > > > > > > Language use.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
>> features have
>> > > > > yet to
>> > > > > > > > be
>> > > > > > > > > > implemented, but they're largely additions to what
>> exists already
>> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the work
>> is
>> > > > > definining
>> > > > > > > > how a
>> > > > > > > > > > user specifies their desires, and turning those into the
>> > > > > appropriate
>> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in
>> October I
>> > > > > wrote at
>> > > > > > > > > > length on the wiki [1] what's missing for additional
>> streaming
>> > > > > > > > features.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> While we have bolstered our testing recently,
>> there's
>> > > > > likely
>> > > > > > > > > > still more we could test to improve our confidence in
>> the SDK,
>> > > > > > > > > > > >> > >> in particular regarding the included
>> transforms
>> > > > > libraries and
>> > > > > > > > > > examples.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Moving Forward
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> My immediate plan is to work on incorporating
>> the Go
>> > > > > SDK
>> > > > > > > > fully
>> > > > > > > > > > into the Beam Programming Guide. I've audited the guide
>> [3], and
>> > > > > > > > > > > >> > >> am beginning to add missing content and
>> filling in the
>> > > > > Go
>> > > > > > > > > > specific gaps. This will be tied to improving the Go
>> Doc with
>> > > > > more Go
>> > > > > > > > > > > >> > >> specific user documentation that isn't
>> appropriate for
>> > > > > the
>> > > > > > > > BPG.
>> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the
>> public
>> > > > > display of
>> > > > > > > > > > that GoDoc.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> If this proposal is accepted by a binding
>> vote, I will
>> > > > > > > > > > incorporate the SDK into the release process, and
>> remove the
>> > > > > > > > "experimental"
>> > > > > > > > > > > >> > >> language around the SDK. This largely entails
>> updating
>> > > > > the
>> > > > > > > > > > release scripts to also build and publish the Go SDK
>> Docker
>> > > > > containers.
>> > > > > > > > > > > >> > >> As for releasing the code, we're technically
>> already
>> > > > > doing so
>> > > > > > > > > > whenever we tag a release branch [4].
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> The clearest signal to the Go community
>> however will be
>> > > > > > > > > > migrating the SDK to use Go Modules for dependency
>> version
>> > > > > control,
>> > > > > > > > > > > >> > >> which Daniel is planning on working on after
>> his Kafka
>> > > > > task.
>> > > > > > > > > > This will put our repo infrastructure, SDK
>> contributors, and
>> > > > > users
>> > > > > > > > > > > >> > >> on the same footing when it comes to
>> dependency
>> > > > > management.
>> > > > > > > > It
>> > > > > > > > > > will remove the "+incompatible" tags one sees on the
>> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> I'm very happy to answer any questions you
>> might have
>> > > > > about
>> > > > > > > > the
>> > > > > > > > > > SDK, and provide additional links as needed. I
>> intentionally
>> > > > > avoided
>> > > > > > > > > > > >> > >> a link barrage in this email, as they can
>> distract
>> > > > > from the
>> > > > > > > > > > point: The SDK is ready for folks to use it, we need to
>> tell
>> > > > > them that
>> > > > > > > > they
>> > > > > > > > > > can
>> > > > > > > > > > > >> > >> rather than they shouldn't.
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> Robert Burke
>> > > > > > > > > > > >> > >> Defacto Beam Go TL
>> > > > > > > > > > > >> > >>
>> > > > > > > > > > > >> > >> [0]
>> https://s.apache.org/beam-go-sdk-design-rfc
>> > > > > > > > > > > >> > >> [1]
>> > > > > > > > > >
>> > > > > > > >
>> > > > >
>> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>> > > > > > > > > > > >> > >> [2]
>> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>> > > > > > > > > > > >> > >> [3]
>> > > > > > > > > >
>> > > > > > > >
>> > > > >
>> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
>> > > > > > > > > > (SDK Audit sheet)
>> > > > > > > > > > > >> > >> [4]
>> > > > > > > > > >
>> > > > > > > >
>> > > > >
>> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>> > > > > > > > > > > >> >
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Re: [Proposal] Go SDK Exits Experimental

Posted by Kyle Weaver <kc...@google.com>.
+1 to extra tags. They'll be trivial to add to our release process, and git
tags are lightweight by design so I don't foresee any problems.

On Tue, Oct 26, 2021 at 5:27 PM Robert Bradshaw <ro...@google.com> wrote:

> Glad you were able to figure it out. The extra tags are certainly
> worth making this work if it's what we have to do, and shouldn't be
> too much of a problem (until, hopefully, it's fixed on the go side).
>
> On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org> wrote:
> >
> > With Kyle's help with the additional tagging of the next RC, we have
> validated that this is the currently correct approach.
> >
> >
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
> >
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
> >
> > Or even:
> > https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam  (links
> to latest tagged version)
> >
> > The main cost to this approach is doubling the number of tags in the
> tags list: https://github.com/apache/beam/tags which is not ideal, but
> overall a small cost. There's no need for "full publish" of these
> additional tags, so we won't be doubling our "releases" (see
> https://github.com/apache/beam/releases).
> >
> > I'll still be filing a bug against the Go commands since the mandatory
> prefixing is unintuitive, and seems unnecessary. If it becomes so, we can
> always delete the tags from the affected branches, and cease the behavior
> going forward. I'll search through the existing Go issues first however to
> see if this has been previously discussed, and report my findings here
> either way.
> >
> > This does require 2 small changes to release guide: The rc tagging
> script, and the finally tagging:
> >
> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
> >
> >
> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
> >
> > I'll make this change later this week (or early next) assuming there are
> no objections.
> >
> > Thank you all very much for your patience,
> > Robert Burke
> > Beam Go Busybody
> >
> >
> > On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
> > > With much research in reading the Go Modules documentation, I have
> confirmed what the issue is.
> > >
> > > We added the go.mod file to sdks/ under the repo root because it's a
> cleaner spot for the change, captures the Java and Python container boot
> code (written in Go) into the module and avoids conflicts in
> interpretations of the vendor directory that lives at the root level.
> > >
> > > However, we missed that when doing so, the standard version tags would
> only apply to modules at the root level, not at modules in subdirectories.
> See https://golang.org/ref/mod#vcs-version, but quoting the important
> paragraph:
> > >
> > > > If a module is defined in a subdirectory within the repository, that
> is, the module subdirectory portion of
> > > > the module path is not empty, then each tag name must be prefixed
> with the module subdirectory,
> > > > followed by a slash. For example, the module
> golang.org/x/tools/gopls is defined in the gopls
> > > > subdirectory of the repository with root path golang.org/x/tools.
> The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in
> that repository.
> > >
> > > Specifically, for the Go SDK to be able to be fetched at the right
> version, we need to have prefixed tags like "sdks/v2.33.0" or
> "sdks/v2.34.0-RC1"
> > >
> > > So, the fix for the Go versioning issue is to amend our Release
> process (including generating Release Candidate builds) to also add a
> prefixed version tag with the same version.
> > >
> > > I can work with Kyle to validate this for 2.34.0 RC1, and if there are
> no objections we can back update the 2.33.0 release branch with such a
> prefixed tag. At which point I can also write the Official Experiemental
> Exit Blog post.
> > >
> > > Thank you all for your patience.
> > > Robert Burke
> > >
> > > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
> > > > Thank you for the detailed update! Let us know if we can help.
> > > >
> > > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org>
> wrote:
> > > >
> > > > > This is a status update.
> > > > >
> > > > > At this point 2.33.0 is released, but there are difficulties with
> > > > > accessing the tagged versions using the standard go tools. It's
> currently
> > > > > under investigation.
> > > > >
> > > > > Using the v2 path in a go program then running `go mod tidy` will
> populate
> > > > > the file with  a pseudo-version rather than the latest tag
> (v2.33.0)  (eg
> > > > > the line looks like
> > > > > require github.com/apache/beam/sdks/v2
> v2.0.0-20211013181004-a9120e083008
> > > > > )
> > > > >
> > > > > While this will work, it's not the desired experience for users at
> this
> > > > > point. Current downside is that the releases are not meaningful
> targets for
> > > > > some reason. However, we retain the other benefits of Go Modules
> (actual
> > > > > dependency versioning, management by go tools).
> > > > >
> > > > > The issue is some combination of the go tooling [A] , that we
> added a go
> > > > > mod file outside of the repo root [B], and that we did not
> increment the
> > > > > major version (v2 -> v3) when adding the go mod file [C].
> > > > >
> > > > > [B] From the go documentation, this should be legal and fine, even
> if it's
> > > > > not recommended. This is fortunate because the root of the repo
> would have
> > > > > played poorly with root vendor directory, which the go tools have
> opinions
> > > > > on.
> > > > >
> > > > > [C] Incrementing the major version is recommended,in the Go Modules
> > > > > documentation, when transitioning to Go Modules. However, it never
> said it
> > > > > was required, nor did it indicate this current failure mode. If
> anything
> > > > > this should be documented in those docs, if it's not another bug.
> We would
> > > > > not necessarily want to declare a global v3 for beam at this time,
> for just
> > > > > the Go SDK, it would become confusing rather quickly. Notionally
> there are
> > > > > some larger breaking changes the Java and Python SDKs would want
> to make in
> > > > > such an event, and thus it's a larger conversation, that is out of
> scope at
> > > > > this time.
> > > > >
> > > > > This leaves [A] where some mis-understanding of the documented
> semantics
> > > > > occurred. I certainly expected the tagged version of the non-root
> go-module
> > > > > to be inherited from the parent, not wholesale ignored. As a
> result, I'll
> > > > > be filing a bug against the go tools to determine this, and see
> what paths
> > > > > forward exist.
> > > > >
> > > > > It's my hope to resolve this before we write a properly
> Experimental Exit
> > > > > blog post for the Go SDK.
> > > > >
> > > > > Thank you for your patience, and time.
> > > > > Robert Burke
> > > > > Beam Go Busybody
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote:
> > > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK now
> uses Go
> > > > > Modules for dependency management, simplifying Go SDK
> contributions. [2]
> > > > > >
> > > > > > The Module file lives in the sdks/ directory so there's a single
> Go
> > > > > Module for the whole SDK, tests, examples, and any support code
> for the
> > > > > container boot builds. This excludes the Go SDK Code katas [3] go
> modules
> > > > > which can be updated once 2.33.0 has been released.
> > > > > >
> > > > > > PR 15365 [4] adds the SDK containers back to the release builds,
> and
> > > > > default uses the release specific container for docker execution
> jobs. For
> > > > > at least the 2.33.0 release this does mean that  manual validation
> will
> > > > > need to explictly specify RC versions of containers. However,
> given that
> > > > > the Go SDK container and worker boot process rarely changes, this
> is
> > > > > unlikely to be an issue.
> > > > > >
> > > > > > At present I'm cleaning up some of the references to
> experimental, and
> > > > > making it clear that 2.33.0 is the first non-experimental release
> (even
> > > > > though that's 4-6 weeks out from actual release.) CHANGES.md  will
> be
> > > > > updated to note the event, but a larger blogpost will happen after
> the
> > > > > release goes public.
> > > > > >
> > > > > > Cheers,
> > > > > > Robert Burke
> > > > > > Defacto Beam Go TL.
> > > > > >
> > > > > > [1]
> > > > >
> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
> > > > > > [2] https://github.com/apache/beam/pull/15323
> > > > > > [3] https://github.com/apache/beam/tree/master/learning/katas/go
> > > > > > [4] https://github.com/apache/beam/pull/15365
> > > > > >
> > > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
> > > > > > > +1, congratulations & thank you!
> > > > > > >
> > > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <
> lostluck@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Regarding documentation update: Initial PR is
> > > > > > > > https://github.com/apache/beam/pull/15057 which goes up to
> section
> > > > > ~4.3.
> > > > > > > > JIRA link for Programing Guide changes:
> > > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
> > > > > > > >
> > > > > > > >
> > > > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com>
> wrote:
> > > > > > > > > Yup!
> > > > > > > > >
> > > > > > > > > My immediate plan is to work on incorporating the Go SDK
> fully
> > > > > into the
> > > > > > > > > Beam Programming Guide. I've audited the guide, and
> > > > > > > > > am beginning to add missing content and filling in the Go
> specific
> > > > > gaps.
> > > > > > > > > This will be tied to improving the Go Doc with more Go
> > > > > > > > > specific user documentation that isn't appropriate for the
> BPG.
> > > > > > > > >
> > > > > > > > > My audit of the guide is here:
> > > > > > > > >
> > > > > > > >
> > > > >
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > > > >
> > > > > > > > > The other sheets focus on features and tests. The feature
> page
> > > > > looks
> > > > > > > > worse
> > > > > > > > > than it is, as it was more productive to focus on what
> isn't
> > > > > available
> > > > > > > > than
> > > > > > > > > what is. That's a snapshot of my actual working sheet but
> I'll be
> > > > > > > > updating
> > > > > > > > > it as needed.
> > > > > > > > >
> > > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <
> iemejia@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Oups forgot to write one question. Will this come with
> revamped
> > > > > > > > > > website instructions/doc for golang too?
> > > > > > > > > >
> > > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <
> iemejia@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Huge +1
> > > > > > > > > > >
> > > > > > > > > > > This is definitely something many people have asked
> about, so
> > > > > it is
> > > > > > > > > > > great to see it finally happening.
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
> > > > > kenn@apache.org>
> > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > +1 awesome
> > > > > > > > > > > >
> > > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
> > > > > lostluck@apache.org
> > > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > > >>
> > > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to get
> those (Go
> > > > > > > > modules
> > > > > > > > > > and LICENSE issue) done before the 2.32 cut, and
> certainly
> > > > > before the
> > > > > > > > 2.33
> > > > > > > > > > cut if release images aren't added to the 2.32 process.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Regarding Go Generics: at some point in the future,
> we may
> > > > > want a
> > > > > > > > > > harder break between a newer Generic first API and and
> the
> > > > > current
> > > > > > > > version,
> > > > > > > > > > but there's no rush. Generics/TypeParameters in Go aren't
> > > > > identical to
> > > > > > > > the
> > > > > > > > > > feature referred to by that term in Java, C++, Rust,
> etc, so
> > > > > it'll
> > > > > > > > take a
> > > > > > > > > > bit of time for that expertise to develop.
> > > > > > > > > > > >>
> > > > > > > > > > > >> However, by the current nature of Go, we had to
> have pretty
> > > > > > > > > > sophisticated reflective analysis to handle DoFns and
> map them
> > > > > to their
> > > > > > > > > > graph inputs. So, adding new helpers like a KV, emitter,
> and
> > > > > Iterator
> > > > > > > > > > types, shouldn't be too difficult. Changing Go SDK
> internals to
> > > > > use
> > > > > > > > > > generics (like the implementation of Stats DoFns like
> Min, Max,
> > > > > etc)
> > > > > > > > would
> > > > > > > > > > also be able to be made transparently to most users, and
> > > > > certainly any
> > > > > > > > of
> > > > > > > > > > the framework for execution time handling (the "worker's
> SDK
> > > > > harness")
> > > > > > > > > > would be able to be cleaned up if need be. Finally,
> adding more
> > > > > > > > > > sophisticated DoFn registration and code generation
> would be
> > > > > able to
> > > > > > > > > > replace the optional code generator entirely, saving
> some users
> > > > > a `go
> > > > > > > > > > generate` step, simplifying getting improved execution
> > > > > performance.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Changing things like making a Type Parameterized
> > > > > PCollection,
> > > > > > > > would
> > > > > > > > > > be far more involved, as would trying to use some kind
> of Apply
> > > > > > > > format. The
> > > > > > > > > > lack of Method Overrides prevents the apply chaining
> approach.
> > > > > Or at
> > > > > > > > least
> > > > > > > > > > prevents it from working simply.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Finally, Go Generics won't be available until Go
> 1.18,
> > > > > which isn't
> > > > > > > > > > until next year. See
> https://blog.golang.org/generics-proposal
> > > > > for
> > > > > > > > > > details.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does
> include a
> > > > > Register
> > > > > > > > > > calling convention, leading to a modest performance
> improvement
> > > > > across
> > > > > > > > the
> > > > > > > > > > board.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Cheers,
> > > > > > > > > > > >> Robert Burke
> > > > > > > > > > > >>
> > > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
> > > > > robertwb@google.com>
> > > > > > > > wrote:
> > > > > > > > > > > >> > +1 to declaring Golang support out of
> experimental once
> > > > > the Go
> > > > > > > > > > Modules
> > > > > > > > > > > >> > issues are solved. I don't think an SDK needs to
> support
> > > > > every
> > > > > > > > > > feature
> > > > > > > > > > > >> > to be accepted, especially now that we can do
> > > > > cross-language
> > > > > > > > > > > >> > transforms, and Go definitely supports enough to
> be quite
> > > > > > > > useful.
> > > > > > > > > > (WRT
> > > > > > > > > > > >> > streaming, my understanding is that Go supports
> the
> > > > > streaming
> > > > > > > > model
> > > > > > > > > > > >> > with windows and timestamps, and runs fine on a
> streaming
> > > > > > > > runner,
> > > > > > > > > > even
> > > > > > > > > > > >> > if more advanced features like state and timers
> aren't yet
> > > > > > > > > > available.)
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > This is a great milestone.
> > > > > > > > > > > >> >
> > > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > > > > > > > tysonjh@google.com>
> > > > > > > > > > wrote:
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > WOW! Big news.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > I'm supportive of leaving experimental status
> after Go
> > > > > Modules
> > > > > > > > > > are completed and the LICENSE issue is resolved. I don't
> think
> > > > > that
> > > > > > > > lacking
> > > > > > > > > > streaming support is a blocker. The other thing I
> checked to see
> > > > > was if
> > > > > > > > > > there were metrics available on metrics.beam.apache.org,
> > > > > specifically
> > > > > > > > for
> > > > > > > > > > measuring code health via post-commit over time, which
> there are
> > > > > and
> > > > > > > > the
> > > > > > > > > > passing test rate is high (Huzzah!). The one thing that
> > > > > surprised me
> > > > > > > > from
> > > > > > > > > > your summary is that when Go introduces generics it
> won't result
> > > > > in any
> > > > > > > > > > backwards incompatible changes in Apache Beam. That's
> great
> > > > > news, but
> > > > > > > > does
> > > > > > > > > > it mean there will be a need to support both non-generic
> and
> > > > > generic
> > > > > > > > APIs
> > > > > > > > > > moving forward? It seems like generics will be
> introduced in the
> > > > > Go
> > > > > > > > 1.17
> > > > > > > > > > release (optimistically) in August this year.
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > >
> > > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > > > > > > > lostluck@apache.org>
> > > > > > > > > > wrote:
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Hello Beam Community!
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> I propose we stop calling the Apache Beam Go
> SDK
> > > > > > > > experimental.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> This thread is to discuss it as a community,
> and any
> > > > > > > > conditions
> > > > > > > > > > that remain that would prevent the exit.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> tl;dr;
> > > > > > > > > > > >> > >> Ask Questions for answers and links! I have
> both.
> > > > > > > > > > > >> > >> This entails including it officially in the
> Release
> > > > > process,
> > > > > > > > > > removing the various "experimental" text throughout the
> repo etc,
> > > > > > > > > > > >> > >> and otherwise treating it like Python and
> Java. Some Go
> > > > > > > > specific
> > > > > > > > > > tasks around dep versioning.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The Go SDK implements the beam model
> efficiently for
> > > > > most
> > > > > > > > batch
> > > > > > > > > > tasks, including basic windowing.
> > > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are
> tested on all
> > > > > > > > Portable
> > > > > > > > > > runners.
> > > > > > > > > > > >> > >> The core APIs are not going to change in
> incompatible
> > > > > ways
> > > > > > > > going
> > > > > > > > > > forward.
> > > > > > > > > > > >> > >> Scalable transforms can be written through
> > > > > SplittableDoFns or
> > > > > > > > > > via Cross Language transforms.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but
> keeping it
> > > > > > > > experimental
> > > > > > > > > > doesn't help with that any further.
> > > > > > > > > > > >> > >> Communities grow through contributions and
> use, and
> > > > > > > > experimental
> > > > > > > > > > markers dissuade users.
> > > > > > > > > > > >> > >> There's plenty to do in order expand what can
> be done
> > > > > with
> > > > > > > > the
> > > > > > > > > > SDK. (Contributions welcome)
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Why Exit Experimental now?
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Typically when we call an SDK or API
> Experimental, it's
> > > > > > > > because
> > > > > > > > > > there's a risk that API or behaviors may change
> significantly.
> > > > > > > > > > > >> > >> This in turn, leads to additional work for
> users of
> > > > > the SDK
> > > > > > > > on
> > > > > > > > > > every release which leads to sticking to older versions
> or
> > > > > forking
> > > > > > > > > > > >> > >> to preserve behavior. Version updates should
> be looked
> > > > > > > > forward
> > > > > > > > > > to, and viewed as having little risk. Further while
> there's been
> > > > > > > > > > > >> > >> previous dicussion about what the "low bar" is
> for a
> > > > > new
> > > > > > > > SDK, it
> > > > > > > > > > hasn't been summarily applied to the Go SDK. I feel this
> has
> > > > > > > > > > > >> > >> hurt development and contribution of new SDK
> languages
> > > > > > > > (inherent
> > > > > > > > > > difficulty of SDK development notwithstanding).
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> When the SDK was designed, it wasn't entirely
> clear
> > > > > what the
> > > > > > > > > > Beam Model should look like in an opinionated language
> like Go.
> > > > > > > > > > > >> > >> Their initial take (see
> > > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes
> into
> > > > > detail
> > > > > > > > what it
> > > > > > > > > > means for a language without
> > > > > > > > > > > >> > >> Generics, or overloading, or inheritance to
> implement
> > > > > the
> > > > > > > > beam
> > > > > > > > > > model. One could largely throw away static types (like
> Python),
> > > > > > > > > > > >> > >> but this approach rings hollow for Go. It
> would not do
> > > > > if the
> > > > > > > > > > approach couldn't grow and scale to the Beam Model. It's
> also
> > > > > hard
> > > > > > > > > > > >> > >> to tell if an API is any good before there are
> users.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Further, in the early days of Portability,
> there
> > > > > wasn't a
> > > > > > > > way to
> > > > > > > > > > write scalable DoFns, dynamically or otherwise. It's an
> > > > > incredible
> > > > > > > > > > > >> > >> bottleneck to need to do all initial fanout of
> work on
> > > > > a
> > > > > > > > single
> > > > > > > > > > machine, write everything to a Reshuffle, just in order
> to scale
> > > > > up.
> > > > > > > > > > > >> > >> Without being able to scale, Beam is little
> more than
> > > > > > > > overhead.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> At this point, both of these needs are met
> within the
> > > > > Go SDK
> > > > > > > > for
> > > > > > > > > > open source.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Background
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The Go SDK has been a part of the beam repo
> for a few
> > > > > years
> > > > > > > > now,
> > > > > > > > > > since it was accidentally merged into master.
> > > > > > > > > > > >> > >> Since then it's been called experimental, and
> not
> > > > > officially
> > > > > > > > > > part of the releases.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Of the SDKs, it's was always designed around
> Beam
> > > > > Portability
> > > > > > > > > > first. It never had any "Legacy" (SDK x Runner specific )
> > > > > workers.
> > > > > > > > > > > >> > >> It's always used the Beam Pipeline protos and
> FnAPI to
> > > > > > > > execute
> > > > > > > > > > jobs, first with some very experimental code on
> Dataflow, but now
> > > > > > > > > > > >> > >> on all portable supported runners, like Flink,
> Spark,
> > > > > the
> > > > > > > > Python
> > > > > > > > > > Portable runner, and Dataflow.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> API Stability
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's
> user API
> > > > > for DoFn
> > > > > > > > > > and pipeline construction since it was first merged in,
> and
> > > > > there are
> > > > > > > > no
> > > > > > > > > > > >> > >> changes to that on the horizon that can't be
> made in a
> > > > > > > > backwards
> > > > > > > > > > compatible manner. Largely these are related to New
> Features, or
> > > > > > > > > > > >> > >> usability improvements enabled by the advent
> of Go
> > > > > Generics
> > > > > > > > > > (think of "real" KV, emitter, and iterator types).
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> It's an open secret that the Go SDK has
> largely been
> > > > > under
> > > > > > > > work
> > > > > > > > > > for use within Google. It's use is called FlumeGo,
> representing
> > > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of
> Flume,
> > > > > Google's
> > > > > > > > batch
> > > > > > > > > > pipeline processing engine. Thus most of the focus on
> improving
> > > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use today,
> and
> > > > > there
> > > > > > > > hasn't
> > > > > > > > > > been a call for fundamental changes to the API for
> ergonomic or
> > > > > > > > > > > >> > >> usability concerns.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Scalability
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Google could get away without the Go SDK
> having an SDK
> > > > > side
> > > > > > > > > > scalability solution as a result of it's integration
> with Flume.
> > > > > > > > > > > >> > >> However, those days are now past.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns along
> with
> > > > > Dynamic
> > > > > > > > > > Splitting, which supports writing scalable batch
> transforms
> > > > > natively
> > > > > > > > > > > >> > >> in the Go SDK.
> > > > > > > > > > > >> > >> The SDK also supports Cross Language
> Transforms, with
> > > > > Beam
> > > > > > > > > > Schema encodings. With it, production hardened transforms
> > > > > > > > > > > >> > >> from Java and Python are a wrapper away.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented
> the SDF
> > > > > side
> > > > > > > > work,
> > > > > > > > > > and completed the Xlang work,) is adding a wrapper for
> the
> > > > > > > > > > > >> > >> Java Kafka IO using Cross Language Transforms,
> which
> > > > > is often
> > > > > > > > > > been requested. This will also enable use of the Beam SQL
> > > > > > > > > > > >> > >> transforms that java enables.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Features
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The Go
> SDK
> > > > > implements
> > > > > > > > > > standard coders, allows for user DoFns, and CombineFns
> and access
> > > > > > > > > > > >> > >> to core transforms like Flatten, GroupByKey,
> and
> > > > > features
> > > > > > > > like
> > > > > > > > > > Side Inputs, Windowing, and User Metrics.
> > > > > > > > > > > >> > >> Basic windowing will be fully supported for
> batch even
> > > > > > > > through
> > > > > > > > > > lifted combines in the 2.32.0 release.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> All of the above enables Beam Go to be
> versatile for
> > > > > batch
> > > > > > > > > > execution on portable runners, and for simple streaming
> > > > > pipelines.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Repo Testing
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit
> tests. On
> > > > > top of
> > > > > > > > > > that, it runs all it's integration tests against the
> Python
> > > > > Portable
> > > > > > > > runner,
> > > > > > > > > > > >> > >> making it quick and robust to detect breaking
> changes
> > > > > without
> > > > > > > > > > overspending community resources. Those same tests are
> also
> > > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The tests are executable against all runners
> via the
> > > > > > > > appropriate
> > > > > > > > > > Go commands (if you've stood up your own job management
> server),
> > > > > > > > > > > >> > >> or Gradle commands (which will spin up runner
> > > > > instances for
> > > > > > > > > > you). Documentation for executing tests and adding new
> ones
> > > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
> > > > > developers as
> > > > > > > > > > they're implemented with the standard Go testing tools.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Shortcomings
> > > > > > > > > > > >> > >> That said, there's still much to do. Let me
> briefly
> > > > > tell you
> > > > > > > > > > what doesn't work, and it's up to you to weigh whether
> they block
> > > > > > > > > > > >> > >> being out of experimental.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> At present, only a textio has been implemented
> as
> > > > > Splittable
> > > > > > > > > > DoFn.
> > > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will
> serve as
> > > > > a the
> > > > > > > > > > first example for future contributions for
> > > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > > > > > > > >> > >> Transforms and IOs are lacking, but at this
> point
> > > > > users are
> > > > > > > > > > empowered to write their own DoFns or wrap existing
> transforms
> > > > > for
> > > > > > > > Cross
> > > > > > > > > > Language use.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> In the core SDK, more streaming focused
> features have
> > > > > yet to
> > > > > > > > be
> > > > > > > > > > implemented, but they're largely additions to what
> exists already
> > > > > > > > > > > >> > >> rather than total rebuilds. Much of the work is
> > > > > definining
> > > > > > > > how a
> > > > > > > > > > user specifies their desires, and turning those into the
> > > > > appropriate
> > > > > > > > > > > >> > >> FnAPI requests at execution time. Back in
> October I
> > > > > wrote at
> > > > > > > > > > length on the wiki [1] what's missing for additional
> streaming
> > > > > > > > features.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> While we have bolstered our testing recently,
> there's
> > > > > likely
> > > > > > > > > > still more we could test to improve our confidence in
> the SDK,
> > > > > > > > > > > >> > >> in particular regarding the included transforms
> > > > > libraries and
> > > > > > > > > > examples.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Moving Forward
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> My immediate plan is to work on incorporating
> the Go
> > > > > SDK
> > > > > > > > fully
> > > > > > > > > > into the Beam Programming Guide. I've audited the guide
> [3], and
> > > > > > > > > > > >> > >> am beginning to add missing content and
> filling in the
> > > > > Go
> > > > > > > > > > specific gaps. This will be tied to improving the Go Doc
> with
> > > > > more Go
> > > > > > > > > > > >> > >> specific user documentation that isn't
> appropriate for
> > > > > the
> > > > > > > > BPG.
> > > > > > > > > > > >> > >> And resolving the LICENSE issue around the
> public
> > > > > display of
> > > > > > > > > > that GoDoc.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> If this proposal is accepted by a binding
> vote, I will
> > > > > > > > > > incorporate the SDK into the release process, and remove
> the
> > > > > > > > "experimental"
> > > > > > > > > > > >> > >> language around the SDK. This largely entails
> updating
> > > > > the
> > > > > > > > > > release scripts to also build and publish the Go SDK
> Docker
> > > > > containers.
> > > > > > > > > > > >> > >> As for releasing the code, we're technically
> already
> > > > > doing so
> > > > > > > > > > whenever we tag a release branch [4].
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> The clearest signal to the Go community
> however will be
> > > > > > > > > > migrating the SDK to use Go Modules for dependency
> version
> > > > > control,
> > > > > > > > > > > >> > >> which Daniel is planning on working on after
> his Kafka
> > > > > task.
> > > > > > > > > > This will put our repo infrastructure, SDK contributors,
> and
> > > > > users
> > > > > > > > > > > >> > >> on the same footing when it comes to dependency
> > > > > management.
> > > > > > > > It
> > > > > > > > > > will remove the "+incompatible" tags one sees on the
> > > > > > > > > > > >> > >> pkg.go.dev list at [4].
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> I'm very happy to answer any questions you
> might have
> > > > > about
> > > > > > > > the
> > > > > > > > > > SDK, and provide additional links as needed. I
> intentionally
> > > > > avoided
> > > > > > > > > > > >> > >> a link barrage in this email, as they can
> distract
> > > > > from the
> > > > > > > > > > point: The SDK is ready for folks to use it, we need to
> tell
> > > > > them that
> > > > > > > > they
> > > > > > > > > > can
> > > > > > > > > > > >> > >> rather than they shouldn't.
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> Robert Burke
> > > > > > > > > > > >> > >> Defacto Beam Go TL
> > > > > > > > > > > >> > >>
> > > > > > > > > > > >> > >> [0]
> https://s.apache.org/beam-go-sdk-design-rfc
> > > > > > > > > > > >> > >> [1]
> > > > > > > > > >
> > > > > > > >
> > > > >
> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > > > > > > > >> > >> [2]
> > > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > > > > > > > >> > >> [3]
> > > > > > > > > >
> > > > > > > >
> > > > >
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > > > > > (SDK Audit sheet)
> > > > > > > > > > > >> > >> [4]
> > > > > > > > > >
> > > > > > > >
> > > > >
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > > > > > > > >> >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Bradshaw <ro...@google.com>.
Glad you were able to figure it out. The extra tags are certainly
worth making this work if it's what we have to do, and shouldn't be
too much of a problem (until, hopefully, it's fixed on the go side).

On Tue, Oct 26, 2021 at 4:53 PM Robert Burke <lo...@apache.org> wrote:
>
> With Kyle's help with the additional tagging of the next RC, we have validated that this is the currently correct approach.
>
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam
>
> Or even:
> https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam  (links to latest tagged version)
>
> The main cost to this approach is doubling the number of tags in the tags list: https://github.com/apache/beam/tags which is not ideal, but overall a small cost. There's no need for "full publish" of these additional tags, so we won't be doubling our "releases" (see https://github.com/apache/beam/releases).
>
> I'll still be filing a bug against the Go commands since the mandatory prefixing is unintuitive, and seems unnecessary. If it becomes so, we can always delete the tags from the affected branches, and cease the behavior going forward. I'll search through the existing Go issues first however to see if this has been previously discussed, and report my findings here either way.
>
> This does require 2 small changes to release guide: The rc tagging script, and the finally tagging:
> https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh
>
> https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag
>
> I'll make this change later this week (or early next) assuming there are no objections.
>
> Thank you all very much for your patience,
> Robert Burke
> Beam Go Busybody
>
>
> On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote:
> > With much research in reading the Go Modules documentation, I have confirmed what the issue is.
> >
> > We added the go.mod file to sdks/ under the repo root because it's a cleaner spot for the change, captures the Java and Python container boot code (written in Go) into the module and avoids conflicts in interpretations of the vendor directory that lives at the root level.
> >
> > However, we missed that when doing so, the standard version tags would only apply to modules at the root level, not at modules in subdirectories. See https://golang.org/ref/mod#vcs-version, but quoting the important paragraph:
> >
> > > If a module is defined in a subdirectory within the repository, that is, the module subdirectory portion of
> > > the module path is not empty, then each tag name must be prefixed with the module subdirectory,
> > > followed by a slash. For example, the module golang.org/x/tools/gopls is defined in the gopls
> > > subdirectory of the repository with root path golang.org/x/tools. The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in that repository.
> >
> > Specifically, for the Go SDK to be able to be fetched at the right version, we need to have prefixed tags like "sdks/v2.33.0" or  "sdks/v2.34.0-RC1"
> >
> > So, the fix for the Go versioning issue is to amend our Release process (including generating Release Candidate builds) to also add a prefixed version tag with the same version.
> >
> > I can work with Kyle to validate this for 2.34.0 RC1, and if there are no objections we can back update the 2.33.0 release branch with such a prefixed tag. At which point I can also write the Official Experiemental Exit Blog post.
> >
> > Thank you all for your patience.
> > Robert Burke
> >
> > On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote:
> > > Thank you for the detailed update! Let us know if we can help.
> > >
> > > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org> wrote:
> > >
> > > > This is a status update.
> > > >
> > > > At this point 2.33.0 is released, but there are difficulties with
> > > > accessing the tagged versions using the standard go tools. It's currently
> > > > under investigation.
> > > >
> > > > Using the v2 path in a go program then running `go mod tidy` will populate
> > > > the file with  a pseudo-version rather than the latest tag (v2.33.0)  (eg
> > > > the line looks like
> > > > require github.com/apache/beam/sdks/v2 v2.0.0-20211013181004-a9120e083008
> > > > )
> > > >
> > > > While this will work, it's not the desired experience for users at this
> > > > point. Current downside is that the releases are not meaningful targets for
> > > > some reason. However, we retain the other benefits of Go Modules (actual
> > > > dependency versioning, management by go tools).
> > > >
> > > > The issue is some combination of the go tooling [A] , that we added a go
> > > > mod file outside of the repo root [B], and that we did not increment the
> > > > major version (v2 -> v3) when adding the go mod file [C].
> > > >
> > > > [B] From the go documentation, this should be legal and fine, even if it's
> > > > not recommended. This is fortunate because the root of the repo would have
> > > > played poorly with root vendor directory, which the go tools have opinions
> > > > on.
> > > >
> > > > [C] Incrementing the major version is recommended,in the Go Modules
> > > > documentation, when transitioning to Go Modules. However, it never said it
> > > > was required, nor did it indicate this current failure mode. If anything
> > > > this should be documented in those docs, if it's not another bug. We would
> > > > not necessarily want to declare a global v3 for beam at this time, for just
> > > > the Go SDK, it would become confusing rather quickly. Notionally there are
> > > > some larger breaking changes the Java and Python SDKs would want to make in
> > > > such an event, and thus it's a larger conversation, that is out of scope at
> > > > this time.
> > > >
> > > > This leaves [A] where some mis-understanding of the documented semantics
> > > > occurred. I certainly expected the tagged version of the non-root go-module
> > > > to be inherited from the parent, not wholesale ignored. As a result, I'll
> > > > be filing a bug against the go tools to determine this, and see what paths
> > > > forward exist.
> > > >
> > > > It's my hope to resolve this before we write a properly Experimental Exit
> > > > blog post for the Go SDK.
> > > >
> > > > Thank you for your patience, and time.
> > > > Robert Burke
> > > > Beam Go Busybody
> > > >
> > > >
> > > >
> > > >
> > > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote:
> > > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK now uses Go
> > > > Modules for dependency management, simplifying Go SDK contributions. [2]
> > > > >
> > > > > The Module file lives in the sdks/ directory so there's a single Go
> > > > Module for the whole SDK, tests, examples, and any support code for the
> > > > container boot builds. This excludes the Go SDK Code katas [3] go modules
> > > > which can be updated once 2.33.0 has been released.
> > > > >
> > > > > PR 15365 [4] adds the SDK containers back to the release builds, and
> > > > default uses the release specific container for docker execution jobs. For
> > > > at least the 2.33.0 release this does mean that  manual validation will
> > > > need to explictly specify RC versions of containers. However, given that
> > > > the Go SDK container and worker boot process rarely changes, this is
> > > > unlikely to be an issue.
> > > > >
> > > > > At present I'm cleaning up some of the references to experimental, and
> > > > making it clear that 2.33.0 is the first non-experimental release (even
> > > > though that's 4-6 weeks out from actual release.) CHANGES.md  will be
> > > > updated to note the event, but a larger blogpost will happen after the
> > > > release goes public.
> > > > >
> > > > > Cheers,
> > > > > Robert Burke
> > > > > Defacto Beam Go TL.
> > > > >
> > > > > [1]
> > > > https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
> > > > > [2] https://github.com/apache/beam/pull/15323
> > > > > [3] https://github.com/apache/beam/tree/master/learning/katas/go
> > > > > [4] https://github.com/apache/beam/pull/15365
> > > > >
> > > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
> > > > > > +1, congratulations & thank you!
> > > > > >
> > > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org>
> > > > wrote:
> > > > > >
> > > > > > > Regarding documentation update: Initial PR is
> > > > > > > https://github.com/apache/beam/pull/15057 which goes up to section
> > > > ~4.3.
> > > > > > > JIRA link for Programing Guide changes:
> > > > > > > https://issues.apache.org/jira/browse/BEAM-12513
> > > > > > >
> > > > > > >
> > > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > > > > > > > Yup!
> > > > > > > >
> > > > > > > > My immediate plan is to work on incorporating the Go SDK fully
> > > > into the
> > > > > > > > Beam Programming Guide. I've audited the guide, and
> > > > > > > > am beginning to add missing content and filling in the Go specific
> > > > gaps.
> > > > > > > > This will be tied to improving the Go Doc with more Go
> > > > > > > > specific user documentation that isn't appropriate for the BPG.
> > > > > > > >
> > > > > > > > My audit of the guide is here:
> > > > > > > >
> > > > > > >
> > > > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > > >
> > > > > > > > The other sheets focus on features and tests. The feature page
> > > > looks
> > > > > > > worse
> > > > > > > > than it is, as it was more productive to focus on what isn't
> > > > available
> > > > > > > than
> > > > > > > > what is. That's a snapshot of my actual working sheet but I'll be
> > > > > > > updating
> > > > > > > > it as needed.
> > > > > > > >
> > > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com>
> > > > wrote:
> > > > > > > >
> > > > > > > > > Oups forgot to write one question. Will this come with revamped
> > > > > > > > > website instructions/doc for golang too?
> > > > > > > > >
> > > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Huge +1
> > > > > > > > > >
> > > > > > > > > > This is definitely something many people have asked about, so
> > > > it is
> > > > > > > > > > great to see it finally happening.
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
> > > > kenn@apache.org>
> > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > +1 awesome
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
> > > > lostluck@apache.org
> > > > > > > >
> > > > > > > > > wrote:
> > > > > > > > > > >>
> > > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> > > > > > > modules
> > > > > > > > > and LICENSE issue) done before the 2.32 cut, and certainly
> > > > before the
> > > > > > > 2.33
> > > > > > > > > cut if release images aren't added to the 2.32 process.
> > > > > > > > > > >>
> > > > > > > > > > >> Regarding Go Generics: at some point in the future, we may
> > > > want a
> > > > > > > > > harder break between a newer Generic first API and and the
> > > > current
> > > > > > > version,
> > > > > > > > > but there's no rush. Generics/TypeParameters in Go aren't
> > > > identical to
> > > > > > > the
> > > > > > > > > feature referred to by that term in Java, C++, Rust, etc, so
> > > > it'll
> > > > > > > take a
> > > > > > > > > bit of time for that expertise to develop.
> > > > > > > > > > >>
> > > > > > > > > > >> However, by the current nature of Go, we had to have pretty
> > > > > > > > > sophisticated reflective analysis to handle DoFns and map them
> > > > to their
> > > > > > > > > graph inputs. So, adding new helpers like a KV, emitter, and
> > > > Iterator
> > > > > > > > > types, shouldn't be too difficult. Changing Go SDK internals to
> > > > use
> > > > > > > > > generics (like the implementation of Stats DoFns like Min, Max,
> > > > etc)
> > > > > > > would
> > > > > > > > > also be able to be made transparently to most users, and
> > > > certainly any
> > > > > > > of
> > > > > > > > > the framework for execution time handling (the "worker's SDK
> > > > harness")
> > > > > > > > > would be able to be cleaned up if need be. Finally, adding more
> > > > > > > > > sophisticated DoFn registration and code generation would be
> > > > able to
> > > > > > > > > replace the optional code generator entirely, saving some users
> > > > a `go
> > > > > > > > > generate` step, simplifying getting improved execution
> > > > performance.
> > > > > > > > > > >>
> > > > > > > > > > >> Changing things like making a Type Parameterized
> > > > PCollection,
> > > > > > > would
> > > > > > > > > be far more involved, as would trying to use some kind of Apply
> > > > > > > format. The
> > > > > > > > > lack of Method Overrides prevents the apply chaining approach.
> > > > Or at
> > > > > > > least
> > > > > > > > > prevents it from working simply.
> > > > > > > > > > >>
> > > > > > > > > > >> Finally, Go Generics won't be available until Go 1.18,
> > > > which isn't
> > > > > > > > > until next year. See https://blog.golang.org/generics-proposal
> > > > for
> > > > > > > > > details.
> > > > > > > > > > >>
> > > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a
> > > > Register
> > > > > > > > > calling convention, leading to a modest performance improvement
> > > > across
> > > > > > > the
> > > > > > > > > board.
> > > > > > > > > > >>
> > > > > > > > > > >> Cheers,
> > > > > > > > > > >> Robert Burke
> > > > > > > > > > >>
> > > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
> > > > robertwb@google.com>
> > > > > > > wrote:
> > > > > > > > > > >> > +1 to declaring Golang support out of experimental once
> > > > the Go
> > > > > > > > > Modules
> > > > > > > > > > >> > issues are solved. I don't think an SDK needs to support
> > > > every
> > > > > > > > > feature
> > > > > > > > > > >> > to be accepted, especially now that we can do
> > > > cross-language
> > > > > > > > > > >> > transforms, and Go definitely supports enough to be quite
> > > > > > > useful.
> > > > > > > > > (WRT
> > > > > > > > > > >> > streaming, my understanding is that Go supports the
> > > > streaming
> > > > > > > model
> > > > > > > > > > >> > with windows and timestamps, and runs fine on a streaming
> > > > > > > runner,
> > > > > > > > > even
> > > > > > > > > > >> > if more advanced features like state and timers aren't yet
> > > > > > > > > available.)
> > > > > > > > > > >> >
> > > > > > > > > > >> > This is a great milestone.
> > > > > > > > > > >> >
> > > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > > > > > > tysonjh@google.com>
> > > > > > > > > wrote:
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > WOW! Big news.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > I'm supportive of leaving experimental status after Go
> > > > Modules
> > > > > > > > > are completed and the LICENSE issue is resolved. I don't think
> > > > that
> > > > > > > lacking
> > > > > > > > > streaming support is a blocker. The other thing I checked to see
> > > > was if
> > > > > > > > > there were metrics available on metrics.beam.apache.org,
> > > > specifically
> > > > > > > for
> > > > > > > > > measuring code health via post-commit over time, which there are
> > > > and
> > > > > > > the
> > > > > > > > > passing test rate is high (Huzzah!). The one thing that
> > > > surprised me
> > > > > > > from
> > > > > > > > > your summary is that when Go introduces generics it won't result
> > > > in any
> > > > > > > > > backwards incompatible changes in Apache Beam. That's great
> > > > news, but
> > > > > > > does
> > > > > > > > > it mean there will be a need to support both non-generic and
> > > > generic
> > > > > > > APIs
> > > > > > > > > moving forward? It seems like generics will be introduced in the
> > > > Go
> > > > > > > 1.17
> > > > > > > > > release (optimistically) in August this year.
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > >
> > > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > > > > > > lostluck@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Hello Beam Community!
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> > > > > > > experimental.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> This thread is to discuss it as a community, and any
> > > > > > > conditions
> > > > > > > > > that remain that would prevent the exit.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> tl;dr;
> > > > > > > > > > >> > >> Ask Questions for answers and links! I have both.
> > > > > > > > > > >> > >> This entails including it officially in the Release
> > > > process,
> > > > > > > > > removing the various "experimental" text throughout the repo etc,
> > > > > > > > > > >> > >> and otherwise treating it like Python and Java. Some Go
> > > > > > > specific
> > > > > > > > > tasks around dep versioning.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The Go SDK implements the beam model efficiently for
> > > > most
> > > > > > > batch
> > > > > > > > > tasks, including basic windowing.
> > > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> > > > > > > Portable
> > > > > > > > > runners.
> > > > > > > > > > >> > >> The core APIs are not going to change in incompatible
> > > > ways
> > > > > > > going
> > > > > > > > > forward.
> > > > > > > > > > >> > >> Scalable transforms can be written through
> > > > SplittableDoFns or
> > > > > > > > > via Cross Language transforms.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> > > > > > > experimental
> > > > > > > > > doesn't help with that any further.
> > > > > > > > > > >> > >> Communities grow through contributions and use, and
> > > > > > > experimental
> > > > > > > > > markers dissuade users.
> > > > > > > > > > >> > >> There's plenty to do in order expand what can be done
> > > > with
> > > > > > > the
> > > > > > > > > SDK. (Contributions welcome)
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Why Exit Experimental now?
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Typically when we call an SDK or API Experimental, it's
> > > > > > > because
> > > > > > > > > there's a risk that API or behaviors may change significantly.
> > > > > > > > > > >> > >> This in turn, leads to additional work for users of
> > > > the SDK
> > > > > > > on
> > > > > > > > > every release which leads to sticking to older versions or
> > > > forking
> > > > > > > > > > >> > >> to preserve behavior. Version updates should be looked
> > > > > > > forward
> > > > > > > > > to, and viewed as having little risk. Further while there's been
> > > > > > > > > > >> > >> previous dicussion about what the "low bar" is for a
> > > > new
> > > > > > > SDK, it
> > > > > > > > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > > > > > > > >> > >> hurt development and contribution of new SDK languages
> > > > > > > (inherent
> > > > > > > > > difficulty of SDK development notwithstanding).
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> When the SDK was designed, it wasn't entirely clear
> > > > what the
> > > > > > > > > Beam Model should look like in an opinionated language like Go.
> > > > > > > > > > >> > >> Their initial take (see
> > > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into
> > > > detail
> > > > > > > what it
> > > > > > > > > means for a language without
> > > > > > > > > > >> > >> Generics, or overloading, or inheritance to implement
> > > > the
> > > > > > > beam
> > > > > > > > > model. One could largely throw away static types (like Python),
> > > > > > > > > > >> > >> but this approach rings hollow for Go. It would not do
> > > > if the
> > > > > > > > > approach couldn't grow and scale to the Beam Model. It's also
> > > > hard
> > > > > > > > > > >> > >> to tell if an API is any good before there are users.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Further, in the early days of Portability, there
> > > > wasn't a
> > > > > > > way to
> > > > > > > > > write scalable DoFns, dynamically or otherwise. It's an
> > > > incredible
> > > > > > > > > > >> > >> bottleneck to need to do all initial fanout of work on
> > > > a
> > > > > > > single
> > > > > > > > > machine, write everything to a Reshuffle, just in order to scale
> > > > up.
> > > > > > > > > > >> > >> Without being able to scale, Beam is little more than
> > > > > > > overhead.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> At this point, both of these needs are met within the
> > > > Go SDK
> > > > > > > for
> > > > > > > > > open source.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Background
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The Go SDK has been a part of the beam repo for a few
> > > > years
> > > > > > > now,
> > > > > > > > > since it was accidentally merged into master.
> > > > > > > > > > >> > >> Since then it's been called experimental, and not
> > > > officially
> > > > > > > > > part of the releases.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Of the SDKs, it's was always designed around Beam
> > > > Portability
> > > > > > > > > first. It never had any "Legacy" (SDK x Runner specific )
> > > > workers.
> > > > > > > > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> > > > > > > execute
> > > > > > > > > jobs, first with some very experimental code on Dataflow, but now
> > > > > > > > > > >> > >> on all portable supported runners, like Flink, Spark,
> > > > the
> > > > > > > Python
> > > > > > > > > Portable runner, and Dataflow.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> API Stability
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's user API
> > > > for DoFn
> > > > > > > > > and pipeline construction since it was first merged in, and
> > > > there are
> > > > > > > no
> > > > > > > > > > >> > >> changes to that on the horizon that can't be made in a
> > > > > > > backwards
> > > > > > > > > compatible manner. Largely these are related to New Features, or
> > > > > > > > > > >> > >> usability improvements enabled by the advent of Go
> > > > Generics
> > > > > > > > > (think of "real" KV, emitter, and iterator types).
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> It's an open secret that the Go SDK has largely been
> > > > under
> > > > > > > work
> > > > > > > > > for use within Google. It's use is called FlumeGo, representing
> > > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of Flume,
> > > > Google's
> > > > > > > batch
> > > > > > > > > pipeline processing engine. Thus most of the focus on improving
> > > > > > > > > > >> > >> batch execution. FlumeGo sees ample use today, and
> > > > there
> > > > > > > hasn't
> > > > > > > > > been a call for fundamental changes to the API for ergonomic or
> > > > > > > > > > >> > >> usability concerns.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Scalability
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Google could get away without the Go SDK having an SDK
> > > > side
> > > > > > > > > scalability solution as a result of it's integration with Flume.
> > > > > > > > > > >> > >> However, those days are now past.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns along with
> > > > Dynamic
> > > > > > > > > Splitting, which supports writing scalable batch transforms
> > > > natively
> > > > > > > > > > >> > >> in the Go SDK.
> > > > > > > > > > >> > >> The SDK also supports Cross Language Transforms, with
> > > > Beam
> > > > > > > > > Schema encodings. With it, production hardened transforms
> > > > > > > > > > >> > >> from Java and Python are a wrapper away.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF
> > > > side
> > > > > > > work,
> > > > > > > > > and completed the Xlang work,) is adding a wrapper for the
> > > > > > > > > > >> > >> Java Kafka IO using Cross Language Transforms, which
> > > > is often
> > > > > > > > > been requested. This will also enable use of the Beam SQL
> > > > > > > > > > >> > >> transforms that java enables.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Features
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK
> > > > implements
> > > > > > > > > standard coders, allows for user DoFns, and CombineFns and access
> > > > > > > > > > >> > >> to core transforms like Flatten, GroupByKey, and
> > > > features
> > > > > > > like
> > > > > > > > > Side Inputs, Windowing, and User Metrics.
> > > > > > > > > > >> > >> Basic windowing will be fully supported for batch even
> > > > > > > through
> > > > > > > > > lifted combines in the 2.32.0 release.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> All of the above enables Beam Go to be versatile for
> > > > batch
> > > > > > > > > execution on portable runners, and for simple streaming
> > > > pipelines.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Repo Testing
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit tests. On
> > > > top of
> > > > > > > > > that, it runs all it's integration tests against the Python
> > > > Portable
> > > > > > > runner,
> > > > > > > > > > >> > >> making it quick and robust to detect breaking changes
> > > > without
> > > > > > > > > overspending community resources. Those same tests are also
> > > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The tests are executable against all runners via the
> > > > > > > appropriate
> > > > > > > > > Go commands (if you've stood up your own job management server),
> > > > > > > > > > >> > >> or Gradle commands (which will spin up runner
> > > > instances for
> > > > > > > > > you). Documentation for executing tests and adding new ones
> > > > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
> > > > developers as
> > > > > > > > > they're implemented with the standard Go testing tools.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Shortcomings
> > > > > > > > > > >> > >> That said, there's still much to do. Let me briefly
> > > > tell you
> > > > > > > > > what doesn't work, and it's up to you to weigh whether they block
> > > > > > > > > > >> > >> being out of experimental.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> At present, only a textio has been implemented as
> > > > Splittable
> > > > > > > > > DoFn.
> > > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will serve as
> > > > a the
> > > > > > > > > first example for future contributions for
> > > > > > > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > > > > > > >> > >> Transforms and IOs are lacking, but at this point
> > > > users are
> > > > > > > > > empowered to write their own DoFns or wrap existing transforms
> > > > for
> > > > > > > Cross
> > > > > > > > > Language use.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> In the core SDK, more streaming focused features have
> > > > yet to
> > > > > > > be
> > > > > > > > > implemented, but they're largely additions to what exists already
> > > > > > > > > > >> > >> rather than total rebuilds. Much of the work is
> > > > definining
> > > > > > > how a
> > > > > > > > > user specifies their desires, and turning those into the
> > > > appropriate
> > > > > > > > > > >> > >> FnAPI requests at execution time. Back in October I
> > > > wrote at
> > > > > > > > > length on the wiki [1] what's missing for additional streaming
> > > > > > > features.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> While we have bolstered our testing recently, there's
> > > > likely
> > > > > > > > > still more we could test to improve our confidence in the SDK,
> > > > > > > > > > >> > >> in particular regarding the included transforms
> > > > libraries and
> > > > > > > > > examples.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Moving Forward
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> My immediate plan is to work on incorporating the Go
> > > > SDK
> > > > > > > fully
> > > > > > > > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > > > > > > > >> > >> am beginning to add missing content and filling in the
> > > > Go
> > > > > > > > > specific gaps. This will be tied to improving the Go Doc with
> > > > more Go
> > > > > > > > > > >> > >> specific user documentation that isn't appropriate for
> > > > the
> > > > > > > BPG.
> > > > > > > > > > >> > >> And resolving the LICENSE issue around the public
> > > > display of
> > > > > > > > > that GoDoc.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > > > > > > > incorporate the SDK into the release process, and remove the
> > > > > > > "experimental"
> > > > > > > > > > >> > >> language around the SDK. This largely entails updating
> > > > the
> > > > > > > > > release scripts to also build and publish the Go SDK Docker
> > > > containers.
> > > > > > > > > > >> > >> As for releasing the code, we're technically already
> > > > doing so
> > > > > > > > > whenever we tag a release branch [4].
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> The clearest signal to the Go community however will be
> > > > > > > > > migrating the SDK to use Go Modules for dependency version
> > > > control,
> > > > > > > > > > >> > >> which Daniel is planning on working on after his Kafka
> > > > task.
> > > > > > > > > This will put our repo infrastructure, SDK contributors, and
> > > > users
> > > > > > > > > > >> > >> on the same footing when it comes to dependency
> > > > management.
> > > > > > > It
> > > > > > > > > will remove the "+incompatible" tags one sees on the
> > > > > > > > > > >> > >> pkg.go.dev list at [4].
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> I'm very happy to answer any questions you might have
> > > > about
> > > > > > > the
> > > > > > > > > SDK, and provide additional links as needed. I intentionally
> > > > avoided
> > > > > > > > > > >> > >> a link barrage in this email, as they can distract
> > > > from the
> > > > > > > > > point: The SDK is ready for folks to use it, we need to tell
> > > > them that
> > > > > > > they
> > > > > > > > > can
> > > > > > > > > > >> > >> rather than they shouldn't.
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> Robert Burke
> > > > > > > > > > >> > >> Defacto Beam Go TL
> > > > > > > > > > >> > >>
> > > > > > > > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > > > > > > > >> > >> [1]
> > > > > > > > >
> > > > > > >
> > > > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > > > > > > >> > >> [2]
> > > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > > > > > > >> > >> [3]
> > > > > > > > >
> > > > > > >
> > > > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > > > > (SDK Audit sheet)
> > > > > > > > > > >> > >> [4]
> > > > > > > > >
> > > > > > >
> > > > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > > > > > > >> >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <lo...@apache.org>.
With Kyle's help with the additional tagging of the next RC, we have validated that this is the currently correct approach.

https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam?tab=versions
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.34.0-RC1/go/pkg/beam

Or even:
https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam  (links to latest tagged version)

The main cost to this approach is doubling the number of tags in the tags list: https://github.com/apache/beam/tags which is not ideal, but overall a small cost. There's no need for "full publish" of these additional tags, so we won't be doubling our "releases" (see https://github.com/apache/beam/releases).

I'll still be filing a bug against the Go commands since the mandatory prefixing is unintuitive, and seems unnecessary. If it becomes so, we can always delete the tags from the affected branches, and cease the behavior going forward. I'll search through the existing Go issues first however to see if this has been previously discussed, and report my findings here either way.

This does require 2 small changes to release guide: The rc tagging script, and the finally tagging:
https://github.com/apache/beam/blob/243128a8fc52798e1b58b0cf1a271d95ee7aa241/release/src/main/scripts/choose_rc_commit.sh

https://github.com/apache/beam/blob/f8660d343fb218cb7acce81ddcc49de0710a0d14/website/www/site/content/en/contribute/release-guide.md#git-tag

I'll make this change later this week (or early next) assuming there are no objections.

Thank you all very much for your patience,
Robert Burke
Beam Go Busybody


On 2021/10/26 23:01:00, Robert Burke <lo...@apache.org> wrote: 
> With much research in reading the Go Modules documentation, I have confirmed what the issue is.
> 
> We added the go.mod file to sdks/ under the repo root because it's a cleaner spot for the change, captures the Java and Python container boot code (written in Go) into the module and avoids conflicts in interpretations of the vendor directory that lives at the root level.
> 
> However, we missed that when doing so, the standard version tags would only apply to modules at the root level, not at modules in subdirectories. See https://golang.org/ref/mod#vcs-version, but quoting the important paragraph:
> 
> > If a module is defined in a subdirectory within the repository, that is, the module subdirectory portion of 
> > the module path is not empty, then each tag name must be prefixed with the module subdirectory, 
> > followed by a slash. For example, the module golang.org/x/tools/gopls is defined in the gopls 
> > subdirectory of the repository with root path golang.org/x/tools. The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in that repository.
> 
> Specifically, for the Go SDK to be able to be fetched at the right version, we need to have prefixed tags like "sdks/v2.33.0" or  "sdks/v2.34.0-RC1"
> 
> So, the fix for the Go versioning issue is to amend our Release process (including generating Release Candidate builds) to also add a prefixed version tag with the same version.
> 
> I can work with Kyle to validate this for 2.34.0 RC1, and if there are no objections we can back update the 2.33.0 release branch with such a prefixed tag. At which point I can also write the Official Experiemental Exit Blog post.
> 
> Thank you all for your patience.
> Robert Burke
> 
> On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote: 
> > Thank you for the detailed update! Let us know if we can help.
> > 
> > On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org> wrote:
> > 
> > > This is a status update.
> > >
> > > At this point 2.33.0 is released, but there are difficulties with
> > > accessing the tagged versions using the standard go tools. It's currently
> > > under investigation.
> > >
> > > Using the v2 path in a go program then running `go mod tidy` will populate
> > > the file with  a pseudo-version rather than the latest tag (v2.33.0)  (eg
> > > the line looks like
> > > require github.com/apache/beam/sdks/v2 v2.0.0-20211013181004-a9120e083008
> > > )
> > >
> > > While this will work, it's not the desired experience for users at this
> > > point. Current downside is that the releases are not meaningful targets for
> > > some reason. However, we retain the other benefits of Go Modules (actual
> > > dependency versioning, management by go tools).
> > >
> > > The issue is some combination of the go tooling [A] , that we added a go
> > > mod file outside of the repo root [B], and that we did not increment the
> > > major version (v2 -> v3) when adding the go mod file [C].
> > >
> > > [B] From the go documentation, this should be legal and fine, even if it's
> > > not recommended. This is fortunate because the root of the repo would have
> > > played poorly with root vendor directory, which the go tools have opinions
> > > on.
> > >
> > > [C] Incrementing the major version is recommended,in the Go Modules
> > > documentation, when transitioning to Go Modules. However, it never said it
> > > was required, nor did it indicate this current failure mode. If anything
> > > this should be documented in those docs, if it's not another bug. We would
> > > not necessarily want to declare a global v3 for beam at this time, for just
> > > the Go SDK, it would become confusing rather quickly. Notionally there are
> > > some larger breaking changes the Java and Python SDKs would want to make in
> > > such an event, and thus it's a larger conversation, that is out of scope at
> > > this time.
> > >
> > > This leaves [A] where some mis-understanding of the documented semantics
> > > occurred. I certainly expected the tagged version of the non-root go-module
> > > to be inherited from the parent, not wholesale ignored. As a result, I'll
> > > be filing a bug against the go tools to determine this, and see what paths
> > > forward exist.
> > >
> > > It's my hope to resolve this before we write a properly Experimental Exit
> > > blog post for the Go SDK.
> > >
> > > Thank you for your patience, and time.
> > > Robert Burke
> > > Beam Go Busybody
> > >
> > >
> > >
> > >
> > > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote:
> > > > With 2.32 the LICENSE issue has been fixed [1], and the SDK now uses Go
> > > Modules for dependency management, simplifying Go SDK contributions. [2]
> > > >
> > > > The Module file lives in the sdks/ directory so there's a single Go
> > > Module for the whole SDK, tests, examples, and any support code for the
> > > container boot builds. This excludes the Go SDK Code katas [3] go modules
> > > which can be updated once 2.33.0 has been released.
> > > >
> > > > PR 15365 [4] adds the SDK containers back to the release builds, and
> > > default uses the release specific container for docker execution jobs. For
> > > at least the 2.33.0 release this does mean that  manual validation will
> > > need to explictly specify RC versions of containers. However, given that
> > > the Go SDK container and worker boot process rarely changes, this is
> > > unlikely to be an issue.
> > > >
> > > > At present I'm cleaning up some of the references to experimental, and
> > > making it clear that 2.33.0 is the first non-experimental release (even
> > > though that's 4-6 weeks out from actual release.) CHANGES.md  will be
> > > updated to note the event, but a larger blogpost will happen after the
> > > release goes public.
> > > >
> > > > Cheers,
> > > > Robert Burke
> > > > Defacto Beam Go TL.
> > > >
> > > > [1]
> > > https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
> > > > [2] https://github.com/apache/beam/pull/15323
> > > > [3] https://github.com/apache/beam/tree/master/learning/katas/go
> > > > [4] https://github.com/apache/beam/pull/15365
> > > >
> > > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
> > > > > +1, congratulations & thank you!
> > > > >
> > > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org>
> > > wrote:
> > > > >
> > > > > > Regarding documentation update: Initial PR is
> > > > > > https://github.com/apache/beam/pull/15057 which goes up to section
> > > ~4.3.
> > > > > > JIRA link for Programing Guide changes:
> > > > > > https://issues.apache.org/jira/browse/BEAM-12513
> > > > > >
> > > > > >
> > > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > > > > > > Yup!
> > > > > > >
> > > > > > > My immediate plan is to work on incorporating the Go SDK fully
> > > into the
> > > > > > > Beam Programming Guide. I've audited the guide, and
> > > > > > > am beginning to add missing content and filling in the Go specific
> > > gaps.
> > > > > > > This will be tied to improving the Go Doc with more Go
> > > > > > > specific user documentation that isn't appropriate for the BPG.
> > > > > > >
> > > > > > > My audit of the guide is here:
> > > > > > >
> > > > > >
> > > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > >
> > > > > > > The other sheets focus on features and tests. The feature page
> > > looks
> > > > > > worse
> > > > > > > than it is, as it was more productive to focus on what isn't
> > > available
> > > > > > than
> > > > > > > what is. That's a snapshot of my actual working sheet but I'll be
> > > > > > updating
> > > > > > > it as needed.
> > > > > > >
> > > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Oups forgot to write one question. Will this come with revamped
> > > > > > > > website instructions/doc for golang too?
> > > > > > > >
> > > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Huge +1
> > > > > > > > >
> > > > > > > > > This is definitely something many people have asked about, so
> > > it is
> > > > > > > > > great to see it finally happening.
> > > > > > > > >
> > > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
> > > kenn@apache.org>
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > +1 awesome
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
> > > lostluck@apache.org
> > > > > > >
> > > > > > > > wrote:
> > > > > > > > > >>
> > > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> > > > > > modules
> > > > > > > > and LICENSE issue) done before the 2.32 cut, and certainly
> > > before the
> > > > > > 2.33
> > > > > > > > cut if release images aren't added to the 2.32 process.
> > > > > > > > > >>
> > > > > > > > > >> Regarding Go Generics: at some point in the future, we may
> > > want a
> > > > > > > > harder break between a newer Generic first API and and the
> > > current
> > > > > > version,
> > > > > > > > but there's no rush. Generics/TypeParameters in Go aren't
> > > identical to
> > > > > > the
> > > > > > > > feature referred to by that term in Java, C++, Rust, etc, so
> > > it'll
> > > > > > take a
> > > > > > > > bit of time for that expertise to develop.
> > > > > > > > > >>
> > > > > > > > > >> However, by the current nature of Go, we had to have pretty
> > > > > > > > sophisticated reflective analysis to handle DoFns and map them
> > > to their
> > > > > > > > graph inputs. So, adding new helpers like a KV, emitter, and
> > > Iterator
> > > > > > > > types, shouldn't be too difficult. Changing Go SDK internals to
> > > use
> > > > > > > > generics (like the implementation of Stats DoFns like Min, Max,
> > > etc)
> > > > > > would
> > > > > > > > also be able to be made transparently to most users, and
> > > certainly any
> > > > > > of
> > > > > > > > the framework for execution time handling (the "worker's SDK
> > > harness")
> > > > > > > > would be able to be cleaned up if need be. Finally, adding more
> > > > > > > > sophisticated DoFn registration and code generation would be
> > > able to
> > > > > > > > replace the optional code generator entirely, saving some users
> > > a `go
> > > > > > > > generate` step, simplifying getting improved execution
> > > performance.
> > > > > > > > > >>
> > > > > > > > > >> Changing things like making a Type Parameterized
> > > PCollection,
> > > > > > would
> > > > > > > > be far more involved, as would trying to use some kind of Apply
> > > > > > format. The
> > > > > > > > lack of Method Overrides prevents the apply chaining approach.
> > > Or at
> > > > > > least
> > > > > > > > prevents it from working simply.
> > > > > > > > > >>
> > > > > > > > > >> Finally, Go Generics won't be available until Go 1.18,
> > > which isn't
> > > > > > > > until next year. See https://blog.golang.org/generics-proposal
> > > for
> > > > > > > > details.
> > > > > > > > > >>
> > > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a
> > > Register
> > > > > > > > calling convention, leading to a modest performance improvement
> > > across
> > > > > > the
> > > > > > > > board.
> > > > > > > > > >>
> > > > > > > > > >> Cheers,
> > > > > > > > > >> Robert Burke
> > > > > > > > > >>
> > > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
> > > robertwb@google.com>
> > > > > > wrote:
> > > > > > > > > >> > +1 to declaring Golang support out of experimental once
> > > the Go
> > > > > > > > Modules
> > > > > > > > > >> > issues are solved. I don't think an SDK needs to support
> > > every
> > > > > > > > feature
> > > > > > > > > >> > to be accepted, especially now that we can do
> > > cross-language
> > > > > > > > > >> > transforms, and Go definitely supports enough to be quite
> > > > > > useful.
> > > > > > > > (WRT
> > > > > > > > > >> > streaming, my understanding is that Go supports the
> > > streaming
> > > > > > model
> > > > > > > > > >> > with windows and timestamps, and runs fine on a streaming
> > > > > > runner,
> > > > > > > > even
> > > > > > > > > >> > if more advanced features like state and timers aren't yet
> > > > > > > > available.)
> > > > > > > > > >> >
> > > > > > > > > >> > This is a great milestone.
> > > > > > > > > >> >
> > > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > > > > > tysonjh@google.com>
> > > > > > > > wrote:
> > > > > > > > > >> > >
> > > > > > > > > >> > > WOW! Big news.
> > > > > > > > > >> > >
> > > > > > > > > >> > > I'm supportive of leaving experimental status after Go
> > > Modules
> > > > > > > > are completed and the LICENSE issue is resolved. I don't think
> > > that
> > > > > > lacking
> > > > > > > > streaming support is a blocker. The other thing I checked to see
> > > was if
> > > > > > > > there were metrics available on metrics.beam.apache.org,
> > > specifically
> > > > > > for
> > > > > > > > measuring code health via post-commit over time, which there are
> > > and
> > > > > > the
> > > > > > > > passing test rate is high (Huzzah!). The one thing that
> > > surprised me
> > > > > > from
> > > > > > > > your summary is that when Go introduces generics it won't result
> > > in any
> > > > > > > > backwards incompatible changes in Apache Beam. That's great
> > > news, but
> > > > > > does
> > > > > > > > it mean there will be a need to support both non-generic and
> > > generic
> > > > > > APIs
> > > > > > > > moving forward? It seems like generics will be introduced in the
> > > Go
> > > > > > 1.17
> > > > > > > > release (optimistically) in August this year.
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > >
> > > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > > > > > lostluck@apache.org>
> > > > > > > > wrote:
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Hello Beam Community!
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> > > > > > experimental.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> This thread is to discuss it as a community, and any
> > > > > > conditions
> > > > > > > > that remain that would prevent the exit.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> tl;dr;
> > > > > > > > > >> > >> Ask Questions for answers and links! I have both.
> > > > > > > > > >> > >> This entails including it officially in the Release
> > > process,
> > > > > > > > removing the various "experimental" text throughout the repo etc,
> > > > > > > > > >> > >> and otherwise treating it like Python and Java. Some Go
> > > > > > specific
> > > > > > > > tasks around dep versioning.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The Go SDK implements the beam model efficiently for
> > > most
> > > > > > batch
> > > > > > > > tasks, including basic windowing.
> > > > > > > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> > > > > > Portable
> > > > > > > > runners.
> > > > > > > > > >> > >> The core APIs are not going to change in incompatible
> > > ways
> > > > > > going
> > > > > > > > forward.
> > > > > > > > > >> > >> Scalable transforms can be written through
> > > SplittableDoFns or
> > > > > > > > via Cross Language transforms.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> > > > > > experimental
> > > > > > > > doesn't help with that any further.
> > > > > > > > > >> > >> Communities grow through contributions and use, and
> > > > > > experimental
> > > > > > > > markers dissuade users.
> > > > > > > > > >> > >> There's plenty to do in order expand what can be done
> > > with
> > > > > > the
> > > > > > > > SDK. (Contributions welcome)
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Why Exit Experimental now?
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Typically when we call an SDK or API Experimental, it's
> > > > > > because
> > > > > > > > there's a risk that API or behaviors may change significantly.
> > > > > > > > > >> > >> This in turn, leads to additional work for users of
> > > the SDK
> > > > > > on
> > > > > > > > every release which leads to sticking to older versions or
> > > forking
> > > > > > > > > >> > >> to preserve behavior. Version updates should be looked
> > > > > > forward
> > > > > > > > to, and viewed as having little risk. Further while there's been
> > > > > > > > > >> > >> previous dicussion about what the "low bar" is for a
> > > new
> > > > > > SDK, it
> > > > > > > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > > > > > > >> > >> hurt development and contribution of new SDK languages
> > > > > > (inherent
> > > > > > > > difficulty of SDK development notwithstanding).
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> When the SDK was designed, it wasn't entirely clear
> > > what the
> > > > > > > > Beam Model should look like in an opinionated language like Go.
> > > > > > > > > >> > >> Their initial take (see
> > > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into
> > > detail
> > > > > > what it
> > > > > > > > means for a language without
> > > > > > > > > >> > >> Generics, or overloading, or inheritance to implement
> > > the
> > > > > > beam
> > > > > > > > model. One could largely throw away static types (like Python),
> > > > > > > > > >> > >> but this approach rings hollow for Go. It would not do
> > > if the
> > > > > > > > approach couldn't grow and scale to the Beam Model. It's also
> > > hard
> > > > > > > > > >> > >> to tell if an API is any good before there are users.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Further, in the early days of Portability, there
> > > wasn't a
> > > > > > way to
> > > > > > > > write scalable DoFns, dynamically or otherwise. It's an
> > > incredible
> > > > > > > > > >> > >> bottleneck to need to do all initial fanout of work on
> > > a
> > > > > > single
> > > > > > > > machine, write everything to a Reshuffle, just in order to scale
> > > up.
> > > > > > > > > >> > >> Without being able to scale, Beam is little more than
> > > > > > overhead.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> At this point, both of these needs are met within the
> > > Go SDK
> > > > > > for
> > > > > > > > open source.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Background
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The Go SDK has been a part of the beam repo for a few
> > > years
> > > > > > now,
> > > > > > > > since it was accidentally merged into master.
> > > > > > > > > >> > >> Since then it's been called experimental, and not
> > > officially
> > > > > > > > part of the releases.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Of the SDKs, it's was always designed around Beam
> > > Portability
> > > > > > > > first. It never had any "Legacy" (SDK x Runner specific )
> > > workers.
> > > > > > > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> > > > > > execute
> > > > > > > > jobs, first with some very experimental code on Dataflow, but now
> > > > > > > > > >> > >> on all portable supported runners, like Flink, Spark,
> > > the
> > > > > > Python
> > > > > > > > Portable runner, and Dataflow.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> API Stability
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's user API
> > > for DoFn
> > > > > > > > and pipeline construction since it was first merged in, and
> > > there are
> > > > > > no
> > > > > > > > > >> > >> changes to that on the horizon that can't be made in a
> > > > > > backwards
> > > > > > > > compatible manner. Largely these are related to New Features, or
> > > > > > > > > >> > >> usability improvements enabled by the advent of Go
> > > Generics
> > > > > > > > (think of "real" KV, emitter, and iterator types).
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> It's an open secret that the Go SDK has largely been
> > > under
> > > > > > work
> > > > > > > > for use within Google. It's use is called FlumeGo, representing
> > > > > > > > > >> > >> the Apache Beam Go SDK, running on top of Flume,
> > > Google's
> > > > > > batch
> > > > > > > > pipeline processing engine. Thus most of the focus on improving
> > > > > > > > > >> > >> batch execution. FlumeGo sees ample use today, and
> > > there
> > > > > > hasn't
> > > > > > > > been a call for fundamental changes to the API for ergonomic or
> > > > > > > > > >> > >> usability concerns.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Scalability
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Google could get away without the Go SDK having an SDK
> > > side
> > > > > > > > scalability solution as a result of it's integration with Flume.
> > > > > > > > > >> > >> However, those days are now past.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The Go SDK now supports SplittableDoFns along with
> > > Dynamic
> > > > > > > > Splitting, which supports writing scalable batch transforms
> > > natively
> > > > > > > > > >> > >> in the Go SDK.
> > > > > > > > > >> > >> The SDK also supports Cross Language Transforms, with
> > > Beam
> > > > > > > > Schema encodings. With it, production hardened transforms
> > > > > > > > > >> > >> from Java and Python are a wrapper away.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF
> > > side
> > > > > > work,
> > > > > > > > and completed the Xlang work,) is adding a wrapper for the
> > > > > > > > > >> > >> Java Kafka IO using Cross Language Transforms, which
> > > is often
> > > > > > > > been requested. This will also enable use of the Beam SQL
> > > > > > > > > >> > >> transforms that java enables.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Features
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK
> > > implements
> > > > > > > > standard coders, allows for user DoFns, and CombineFns and access
> > > > > > > > > >> > >> to core transforms like Flatten, GroupByKey, and
> > > features
> > > > > > like
> > > > > > > > Side Inputs, Windowing, and User Metrics.
> > > > > > > > > >> > >> Basic windowing will be fully supported for batch even
> > > > > > through
> > > > > > > > lifted combines in the 2.32.0 release.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> All of the above enables Beam Go to be versatile for
> > > batch
> > > > > > > > execution on portable runners, and for simple streaming
> > > pipelines.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Repo Testing
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> On precommit the Go SDK runs all it's unit tests. On
> > > top of
> > > > > > > > that, it runs all it's integration tests against the Python
> > > Portable
> > > > > > runner,
> > > > > > > > > >> > >> making it quick and robust to detect breaking changes
> > > without
> > > > > > > > overspending community resources. Those same tests are also
> > > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The tests are executable against all runners via the
> > > > > > appropriate
> > > > > > > > Go commands (if you've stood up your own job management server),
> > > > > > > > > >> > >> or Gradle commands (which will spin up runner
> > > instances for
> > > > > > > > you). Documentation for executing tests and adding new ones
> > > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
> > > developers as
> > > > > > > > they're implemented with the standard Go testing tools.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Shortcomings
> > > > > > > > > >> > >> That said, there's still much to do. Let me briefly
> > > tell you
> > > > > > > > what doesn't work, and it's up to you to weigh whether they block
> > > > > > > > > >> > >> being out of experimental.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> At present, only a textio has been implemented as
> > > Splittable
> > > > > > > > DoFn.
> > > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will serve as
> > > a the
> > > > > > > > first example for future contributions for
> > > > > > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > > > > > >> > >> Transforms and IOs are lacking, but at this point
> > > users are
> > > > > > > > empowered to write their own DoFns or wrap existing transforms
> > > for
> > > > > > Cross
> > > > > > > > Language use.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> In the core SDK, more streaming focused features have
> > > yet to
> > > > > > be
> > > > > > > > implemented, but they're largely additions to what exists already
> > > > > > > > > >> > >> rather than total rebuilds. Much of the work is
> > > definining
> > > > > > how a
> > > > > > > > user specifies their desires, and turning those into the
> > > appropriate
> > > > > > > > > >> > >> FnAPI requests at execution time. Back in October I
> > > wrote at
> > > > > > > > length on the wiki [1] what's missing for additional streaming
> > > > > > features.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> While we have bolstered our testing recently, there's
> > > likely
> > > > > > > > still more we could test to improve our confidence in the SDK,
> > > > > > > > > >> > >> in particular regarding the included transforms
> > > libraries and
> > > > > > > > examples.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Moving Forward
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> My immediate plan is to work on incorporating the Go
> > > SDK
> > > > > > fully
> > > > > > > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > > > > > > >> > >> am beginning to add missing content and filling in the
> > > Go
> > > > > > > > specific gaps. This will be tied to improving the Go Doc with
> > > more Go
> > > > > > > > > >> > >> specific user documentation that isn't appropriate for
> > > the
> > > > > > BPG.
> > > > > > > > > >> > >> And resolving the LICENSE issue around the public
> > > display of
> > > > > > > > that GoDoc.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > > > > > > incorporate the SDK into the release process, and remove the
> > > > > > "experimental"
> > > > > > > > > >> > >> language around the SDK. This largely entails updating
> > > the
> > > > > > > > release scripts to also build and publish the Go SDK Docker
> > > containers.
> > > > > > > > > >> > >> As for releasing the code, we're technically already
> > > doing so
> > > > > > > > whenever we tag a release branch [4].
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> The clearest signal to the Go community however will be
> > > > > > > > migrating the SDK to use Go Modules for dependency version
> > > control,
> > > > > > > > > >> > >> which Daniel is planning on working on after his Kafka
> > > task.
> > > > > > > > This will put our repo infrastructure, SDK contributors, and
> > > users
> > > > > > > > > >> > >> on the same footing when it comes to dependency
> > > management.
> > > > > > It
> > > > > > > > will remove the "+incompatible" tags one sees on the
> > > > > > > > > >> > >> pkg.go.dev list at [4].
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> I'm very happy to answer any questions you might have
> > > about
> > > > > > the
> > > > > > > > SDK, and provide additional links as needed. I intentionally
> > > avoided
> > > > > > > > > >> > >> a link barrage in this email, as they can distract
> > > from the
> > > > > > > > point: The SDK is ready for folks to use it, we need to tell
> > > them that
> > > > > > they
> > > > > > > > can
> > > > > > > > > >> > >> rather than they shouldn't.
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> Robert Burke
> > > > > > > > > >> > >> Defacto Beam Go TL
> > > > > > > > > >> > >>
> > > > > > > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > > > > > > >> > >> [1]
> > > > > > > >
> > > > > >
> > > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > > > > > >> > >> [2]
> > > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > > > > > >> > >> [3]
> > > > > > > >
> > > > > >
> > > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > > > (SDK Audit sheet)
> > > > > > > > > >> > >> [4]
> > > > > > > >
> > > > > >
> > > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > > > > > >> >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > 
> 

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <lo...@apache.org>.
With much research in reading the Go Modules documentation, I have confirmed what the issue is.

We added the go.mod file to sdks/ under the repo root because it's a cleaner spot for the change, captures the Java and Python container boot code (written in Go) into the module and avoids conflicts in interpretations of the vendor directory that lives at the root level.

However, we missed that when doing so, the standard version tags would only apply to modules at the root level, not at modules in subdirectories. See https://golang.org/ref/mod#vcs-version, but quoting the important paragraph:

> If a module is defined in a subdirectory within the repository, that is, the module subdirectory portion of 
> the module path is not empty, then each tag name must be prefixed with the module subdirectory, 
> followed by a slash. For example, the module golang.org/x/tools/gopls is defined in the gopls 
> subdirectory of the repository with root path golang.org/x/tools. The version v0.4.0 of that module must > have the tag named gopls/v0.4.0 in that repository.

Specifically, for the Go SDK to be able to be fetched at the right version, we need to have prefixed tags like "sdks/v2.33.0" or  "sdks/v2.34.0-RC1"

So, the fix for the Go versioning issue is to amend our Release process (including generating Release Candidate builds) to also add a prefixed version tag with the same version.

I can work with Kyle to validate this for 2.34.0 RC1, and if there are no objections we can back update the 2.33.0 release branch with such a prefixed tag. At which point I can also write the Official Experiemental Exit Blog post.

Thank you all for your patience.
Robert Burke

On 2021/10/14 00:00:53, Ahmet Altay <al...@google.com> wrote: 
> Thank you for the detailed update! Let us know if we can help.
> 
> On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org> wrote:
> 
> > This is a status update.
> >
> > At this point 2.33.0 is released, but there are difficulties with
> > accessing the tagged versions using the standard go tools. It's currently
> > under investigation.
> >
> > Using the v2 path in a go program then running `go mod tidy` will populate
> > the file with  a pseudo-version rather than the latest tag (v2.33.0)  (eg
> > the line looks like
> > require github.com/apache/beam/sdks/v2 v2.0.0-20211013181004-a9120e083008
> > )
> >
> > While this will work, it's not the desired experience for users at this
> > point. Current downside is that the releases are not meaningful targets for
> > some reason. However, we retain the other benefits of Go Modules (actual
> > dependency versioning, management by go tools).
> >
> > The issue is some combination of the go tooling [A] , that we added a go
> > mod file outside of the repo root [B], and that we did not increment the
> > major version (v2 -> v3) when adding the go mod file [C].
> >
> > [B] From the go documentation, this should be legal and fine, even if it's
> > not recommended. This is fortunate because the root of the repo would have
> > played poorly with root vendor directory, which the go tools have opinions
> > on.
> >
> > [C] Incrementing the major version is recommended,in the Go Modules
> > documentation, when transitioning to Go Modules. However, it never said it
> > was required, nor did it indicate this current failure mode. If anything
> > this should be documented in those docs, if it's not another bug. We would
> > not necessarily want to declare a global v3 for beam at this time, for just
> > the Go SDK, it would become confusing rather quickly. Notionally there are
> > some larger breaking changes the Java and Python SDKs would want to make in
> > such an event, and thus it's a larger conversation, that is out of scope at
> > this time.
> >
> > This leaves [A] where some mis-understanding of the documented semantics
> > occurred. I certainly expected the tagged version of the non-root go-module
> > to be inherited from the parent, not wholesale ignored. As a result, I'll
> > be filing a bug against the go tools to determine this, and see what paths
> > forward exist.
> >
> > It's my hope to resolve this before we write a properly Experimental Exit
> > blog post for the Go SDK.
> >
> > Thank you for your patience, and time.
> > Robert Burke
> > Beam Go Busybody
> >
> >
> >
> >
> > On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote:
> > > With 2.32 the LICENSE issue has been fixed [1], and the SDK now uses Go
> > Modules for dependency management, simplifying Go SDK contributions. [2]
> > >
> > > The Module file lives in the sdks/ directory so there's a single Go
> > Module for the whole SDK, tests, examples, and any support code for the
> > container boot builds. This excludes the Go SDK Code katas [3] go modules
> > which can be updated once 2.33.0 has been released.
> > >
> > > PR 15365 [4] adds the SDK containers back to the release builds, and
> > default uses the release specific container for docker execution jobs. For
> > at least the 2.33.0 release this does mean that  manual validation will
> > need to explictly specify RC versions of containers. However, given that
> > the Go SDK container and worker boot process rarely changes, this is
> > unlikely to be an issue.
> > >
> > > At present I'm cleaning up some of the references to experimental, and
> > making it clear that 2.33.0 is the first non-experimental release (even
> > though that's 4-6 weeks out from actual release.) CHANGES.md  will be
> > updated to note the event, but a larger blogpost will happen after the
> > release goes public.
> > >
> > > Cheers,
> > > Robert Burke
> > > Defacto Beam Go TL.
> > >
> > > [1]
> > https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
> > > [2] https://github.com/apache/beam/pull/15323
> > > [3] https://github.com/apache/beam/tree/master/learning/katas/go
> > > [4] https://github.com/apache/beam/pull/15365
> > >
> > > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
> > > > +1, congratulations & thank you!
> > > >
> > > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org>
> > wrote:
> > > >
> > > > > Regarding documentation update: Initial PR is
> > > > > https://github.com/apache/beam/pull/15057 which goes up to section
> > ~4.3.
> > > > > JIRA link for Programing Guide changes:
> > > > > https://issues.apache.org/jira/browse/BEAM-12513
> > > > >
> > > > >
> > > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > > > > > Yup!
> > > > > >
> > > > > > My immediate plan is to work on incorporating the Go SDK fully
> > into the
> > > > > > Beam Programming Guide. I've audited the guide, and
> > > > > > am beginning to add missing content and filling in the Go specific
> > gaps.
> > > > > > This will be tied to improving the Go Doc with more Go
> > > > > > specific user documentation that isn't appropriate for the BPG.
> > > > > >
> > > > > > My audit of the guide is here:
> > > > > >
> > > > >
> > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > >
> > > > > > The other sheets focus on features and tests. The feature page
> > looks
> > > > > worse
> > > > > > than it is, as it was more productive to focus on what isn't
> > available
> > > > > than
> > > > > > what is. That's a snapshot of my actual working sheet but I'll be
> > > > > updating
> > > > > > it as needed.
> > > > > >
> > > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Oups forgot to write one question. Will this come with revamped
> > > > > > > website instructions/doc for golang too?
> > > > > > >
> > > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> > > > > wrote:
> > > > > > > >
> > > > > > > > Huge +1
> > > > > > > >
> > > > > > > > This is definitely something many people have asked about, so
> > it is
> > > > > > > > great to see it finally happening.
> > > > > > > >
> > > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
> > kenn@apache.org>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > +1 awesome
> > > > > > > > >
> > > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
> > lostluck@apache.org
> > > > > >
> > > > > > > wrote:
> > > > > > > > >>
> > > > > > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> > > > > modules
> > > > > > > and LICENSE issue) done before the 2.32 cut, and certainly
> > before the
> > > > > 2.33
> > > > > > > cut if release images aren't added to the 2.32 process.
> > > > > > > > >>
> > > > > > > > >> Regarding Go Generics: at some point in the future, we may
> > want a
> > > > > > > harder break between a newer Generic first API and and the
> > current
> > > > > version,
> > > > > > > but there's no rush. Generics/TypeParameters in Go aren't
> > identical to
> > > > > the
> > > > > > > feature referred to by that term in Java, C++, Rust, etc, so
> > it'll
> > > > > take a
> > > > > > > bit of time for that expertise to develop.
> > > > > > > > >>
> > > > > > > > >> However, by the current nature of Go, we had to have pretty
> > > > > > > sophisticated reflective analysis to handle DoFns and map them
> > to their
> > > > > > > graph inputs. So, adding new helpers like a KV, emitter, and
> > Iterator
> > > > > > > types, shouldn't be too difficult. Changing Go SDK internals to
> > use
> > > > > > > generics (like the implementation of Stats DoFns like Min, Max,
> > etc)
> > > > > would
> > > > > > > also be able to be made transparently to most users, and
> > certainly any
> > > > > of
> > > > > > > the framework for execution time handling (the "worker's SDK
> > harness")
> > > > > > > would be able to be cleaned up if need be. Finally, adding more
> > > > > > > sophisticated DoFn registration and code generation would be
> > able to
> > > > > > > replace the optional code generator entirely, saving some users
> > a `go
> > > > > > > generate` step, simplifying getting improved execution
> > performance.
> > > > > > > > >>
> > > > > > > > >> Changing things like making a Type Parameterized
> > PCollection,
> > > > > would
> > > > > > > be far more involved, as would trying to use some kind of Apply
> > > > > format. The
> > > > > > > lack of Method Overrides prevents the apply chaining approach.
> > Or at
> > > > > least
> > > > > > > prevents it from working simply.
> > > > > > > > >>
> > > > > > > > >> Finally, Go Generics won't be available until Go 1.18,
> > which isn't
> > > > > > > until next year. See https://blog.golang.org/generics-proposal
> > for
> > > > > > > details.
> > > > > > > > >>
> > > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a
> > Register
> > > > > > > calling convention, leading to a modest performance improvement
> > across
> > > > > the
> > > > > > > board.
> > > > > > > > >>
> > > > > > > > >> Cheers,
> > > > > > > > >> Robert Burke
> > > > > > > > >>
> > > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
> > robertwb@google.com>
> > > > > wrote:
> > > > > > > > >> > +1 to declaring Golang support out of experimental once
> > the Go
> > > > > > > Modules
> > > > > > > > >> > issues are solved. I don't think an SDK needs to support
> > every
> > > > > > > feature
> > > > > > > > >> > to be accepted, especially now that we can do
> > cross-language
> > > > > > > > >> > transforms, and Go definitely supports enough to be quite
> > > > > useful.
> > > > > > > (WRT
> > > > > > > > >> > streaming, my understanding is that Go supports the
> > streaming
> > > > > model
> > > > > > > > >> > with windows and timestamps, and runs fine on a streaming
> > > > > runner,
> > > > > > > even
> > > > > > > > >> > if more advanced features like state and timers aren't yet
> > > > > > > available.)
> > > > > > > > >> >
> > > > > > > > >> > This is a great milestone.
> > > > > > > > >> >
> > > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > > > > tysonjh@google.com>
> > > > > > > wrote:
> > > > > > > > >> > >
> > > > > > > > >> > > WOW! Big news.
> > > > > > > > >> > >
> > > > > > > > >> > > I'm supportive of leaving experimental status after Go
> > Modules
> > > > > > > are completed and the LICENSE issue is resolved. I don't think
> > that
> > > > > lacking
> > > > > > > streaming support is a blocker. The other thing I checked to see
> > was if
> > > > > > > there were metrics available on metrics.beam.apache.org,
> > specifically
> > > > > for
> > > > > > > measuring code health via post-commit over time, which there are
> > and
> > > > > the
> > > > > > > passing test rate is high (Huzzah!). The one thing that
> > surprised me
> > > > > from
> > > > > > > your summary is that when Go introduces generics it won't result
> > in any
> > > > > > > backwards incompatible changes in Apache Beam. That's great
> > news, but
> > > > > does
> > > > > > > it mean there will be a need to support both non-generic and
> > generic
> > > > > APIs
> > > > > > > moving forward? It seems like generics will be introduced in the
> > Go
> > > > > 1.17
> > > > > > > release (optimistically) in August this year.
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > >
> > > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > > > > lostluck@apache.org>
> > > > > > > wrote:
> > > > > > > > >> > >>
> > > > > > > > >> > >> Hello Beam Community!
> > > > > > > > >> > >>
> > > > > > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> > > > > experimental.
> > > > > > > > >> > >>
> > > > > > > > >> > >> This thread is to discuss it as a community, and any
> > > > > conditions
> > > > > > > that remain that would prevent the exit.
> > > > > > > > >> > >>
> > > > > > > > >> > >> tl;dr;
> > > > > > > > >> > >> Ask Questions for answers and links! I have both.
> > > > > > > > >> > >> This entails including it officially in the Release
> > process,
> > > > > > > removing the various "experimental" text throughout the repo etc,
> > > > > > > > >> > >> and otherwise treating it like Python and Java. Some Go
> > > > > specific
> > > > > > > tasks around dep versioning.
> > > > > > > > >> > >>
> > > > > > > > >> > >> The Go SDK implements the beam model efficiently for
> > most
> > > > > batch
> > > > > > > tasks, including basic windowing.
> > > > > > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> > > > > Portable
> > > > > > > runners.
> > > > > > > > >> > >> The core APIs are not going to change in incompatible
> > ways
> > > > > going
> > > > > > > forward.
> > > > > > > > >> > >> Scalable transforms can be written through
> > SplittableDoFns or
> > > > > > > via Cross Language transforms.
> > > > > > > > >> > >>
> > > > > > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> > > > > experimental
> > > > > > > doesn't help with that any further.
> > > > > > > > >> > >> Communities grow through contributions and use, and
> > > > > experimental
> > > > > > > markers dissuade users.
> > > > > > > > >> > >> There's plenty to do in order expand what can be done
> > with
> > > > > the
> > > > > > > SDK. (Contributions welcome)
> > > > > > > > >> > >>
> > > > > > > > >> > >> Why Exit Experimental now?
> > > > > > > > >> > >>
> > > > > > > > >> > >> Typically when we call an SDK or API Experimental, it's
> > > > > because
> > > > > > > there's a risk that API or behaviors may change significantly.
> > > > > > > > >> > >> This in turn, leads to additional work for users of
> > the SDK
> > > > > on
> > > > > > > every release which leads to sticking to older versions or
> > forking
> > > > > > > > >> > >> to preserve behavior. Version updates should be looked
> > > > > forward
> > > > > > > to, and viewed as having little risk. Further while there's been
> > > > > > > > >> > >> previous dicussion about what the "low bar" is for a
> > new
> > > > > SDK, it
> > > > > > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > > > > > >> > >> hurt development and contribution of new SDK languages
> > > > > (inherent
> > > > > > > difficulty of SDK development notwithstanding).
> > > > > > > > >> > >>
> > > > > > > > >> > >> When the SDK was designed, it wasn't entirely clear
> > what the
> > > > > > > Beam Model should look like in an opinionated language like Go.
> > > > > > > > >> > >> Their initial take (see
> > > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into
> > detail
> > > > > what it
> > > > > > > means for a language without
> > > > > > > > >> > >> Generics, or overloading, or inheritance to implement
> > the
> > > > > beam
> > > > > > > model. One could largely throw away static types (like Python),
> > > > > > > > >> > >> but this approach rings hollow for Go. It would not do
> > if the
> > > > > > > approach couldn't grow and scale to the Beam Model. It's also
> > hard
> > > > > > > > >> > >> to tell if an API is any good before there are users.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Further, in the early days of Portability, there
> > wasn't a
> > > > > way to
> > > > > > > write scalable DoFns, dynamically or otherwise. It's an
> > incredible
> > > > > > > > >> > >> bottleneck to need to do all initial fanout of work on
> > a
> > > > > single
> > > > > > > machine, write everything to a Reshuffle, just in order to scale
> > up.
> > > > > > > > >> > >> Without being able to scale, Beam is little more than
> > > > > overhead.
> > > > > > > > >> > >>
> > > > > > > > >> > >> At this point, both of these needs are met within the
> > Go SDK
> > > > > for
> > > > > > > open source.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Background
> > > > > > > > >> > >>
> > > > > > > > >> > >> The Go SDK has been a part of the beam repo for a few
> > years
> > > > > now,
> > > > > > > since it was accidentally merged into master.
> > > > > > > > >> > >> Since then it's been called experimental, and not
> > officially
> > > > > > > part of the releases.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Of the SDKs, it's was always designed around Beam
> > Portability
> > > > > > > first. It never had any "Legacy" (SDK x Runner specific )
> > workers.
> > > > > > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> > > > > execute
> > > > > > > jobs, first with some very experimental code on Dataflow, but now
> > > > > > > > >> > >> on all portable supported runners, like Flink, Spark,
> > the
> > > > > Python
> > > > > > > Portable runner, and Dataflow.
> > > > > > > > >> > >>
> > > > > > > > >> > >> API Stability
> > > > > > > > >> > >>
> > > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's user API
> > for DoFn
> > > > > > > and pipeline construction since it was first merged in, and
> > there are
> > > > > no
> > > > > > > > >> > >> changes to that on the horizon that can't be made in a
> > > > > backwards
> > > > > > > compatible manner. Largely these are related to New Features, or
> > > > > > > > >> > >> usability improvements enabled by the advent of Go
> > Generics
> > > > > > > (think of "real" KV, emitter, and iterator types).
> > > > > > > > >> > >>
> > > > > > > > >> > >> It's an open secret that the Go SDK has largely been
> > under
> > > > > work
> > > > > > > for use within Google. It's use is called FlumeGo, representing
> > > > > > > > >> > >> the Apache Beam Go SDK, running on top of Flume,
> > Google's
> > > > > batch
> > > > > > > pipeline processing engine. Thus most of the focus on improving
> > > > > > > > >> > >> batch execution. FlumeGo sees ample use today, and
> > there
> > > > > hasn't
> > > > > > > been a call for fundamental changes to the API for ergonomic or
> > > > > > > > >> > >> usability concerns.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Scalability
> > > > > > > > >> > >>
> > > > > > > > >> > >> Google could get away without the Go SDK having an SDK
> > side
> > > > > > > scalability solution as a result of it's integration with Flume.
> > > > > > > > >> > >> However, those days are now past.
> > > > > > > > >> > >>
> > > > > > > > >> > >> The Go SDK now supports SplittableDoFns along with
> > Dynamic
> > > > > > > Splitting, which supports writing scalable batch transforms
> > natively
> > > > > > > > >> > >> in the Go SDK.
> > > > > > > > >> > >> The SDK also supports Cross Language Transforms, with
> > Beam
> > > > > > > Schema encodings. With it, production hardened transforms
> > > > > > > > >> > >> from Java and Python are a wrapper away.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF
> > side
> > > > > work,
> > > > > > > and completed the Xlang work,) is adding a wrapper for the
> > > > > > > > >> > >> Java Kafka IO using Cross Language Transforms, which
> > is often
> > > > > > > been requested. This will also enable use of the Beam SQL
> > > > > > > > >> > >> transforms that java enables.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Features
> > > > > > > > >> > >>
> > > > > > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK
> > implements
> > > > > > > standard coders, allows for user DoFns, and CombineFns and access
> > > > > > > > >> > >> to core transforms like Flatten, GroupByKey, and
> > features
> > > > > like
> > > > > > > Side Inputs, Windowing, and User Metrics.
> > > > > > > > >> > >> Basic windowing will be fully supported for batch even
> > > > > through
> > > > > > > lifted combines in the 2.32.0 release.
> > > > > > > > >> > >>
> > > > > > > > >> > >> All of the above enables Beam Go to be versatile for
> > batch
> > > > > > > execution on portable runners, and for simple streaming
> > pipelines.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Repo Testing
> > > > > > > > >> > >>
> > > > > > > > >> > >> On precommit the Go SDK runs all it's unit tests. On
> > top of
> > > > > > > that, it runs all it's integration tests against the Python
> > Portable
> > > > > runner,
> > > > > > > > >> > >> making it quick and robust to detect breaking changes
> > without
> > > > > > > overspending community resources. Those same tests are also
> > > > > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > > > > >> > >>
> > > > > > > > >> > >> The tests are executable against all runners via the
> > > > > appropriate
> > > > > > > Go commands (if you've stood up your own job management server),
> > > > > > > > >> > >> or Gradle commands (which will spin up runner
> > instances for
> > > > > > > you). Documentation for executing tests and adding new ones
> > > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
> > developers as
> > > > > > > they're implemented with the standard Go testing tools.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Shortcomings
> > > > > > > > >> > >> That said, there's still much to do. Let me briefly
> > tell you
> > > > > > > what doesn't work, and it's up to you to weigh whether they block
> > > > > > > > >> > >> being out of experimental.
> > > > > > > > >> > >>
> > > > > > > > >> > >> At present, only a textio has been implemented as
> > Splittable
> > > > > > > DoFn.
> > > > > > > > >> > >> Once the Kafka wrapper is merged in, it will serve as
> > a the
> > > > > > > first example for future contributions for
> > > > > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > > > > >> > >> Transforms and IOs are lacking, but at this point
> > users are
> > > > > > > empowered to write their own DoFns or wrap existing transforms
> > for
> > > > > Cross
> > > > > > > Language use.
> > > > > > > > >> > >>
> > > > > > > > >> > >> In the core SDK, more streaming focused features have
> > yet to
> > > > > be
> > > > > > > implemented, but they're largely additions to what exists already
> > > > > > > > >> > >> rather than total rebuilds. Much of the work is
> > definining
> > > > > how a
> > > > > > > user specifies their desires, and turning those into the
> > appropriate
> > > > > > > > >> > >> FnAPI requests at execution time. Back in October I
> > wrote at
> > > > > > > length on the wiki [1] what's missing for additional streaming
> > > > > features.
> > > > > > > > >> > >>
> > > > > > > > >> > >> While we have bolstered our testing recently, there's
> > likely
> > > > > > > still more we could test to improve our confidence in the SDK,
> > > > > > > > >> > >> in particular regarding the included transforms
> > libraries and
> > > > > > > examples.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Moving Forward
> > > > > > > > >> > >>
> > > > > > > > >> > >> My immediate plan is to work on incorporating the Go
> > SDK
> > > > > fully
> > > > > > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > > > > > >> > >> am beginning to add missing content and filling in the
> > Go
> > > > > > > specific gaps. This will be tied to improving the Go Doc with
> > more Go
> > > > > > > > >> > >> specific user documentation that isn't appropriate for
> > the
> > > > > BPG.
> > > > > > > > >> > >> And resolving the LICENSE issue around the public
> > display of
> > > > > > > that GoDoc.
> > > > > > > > >> > >>
> > > > > > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > > > > > incorporate the SDK into the release process, and remove the
> > > > > "experimental"
> > > > > > > > >> > >> language around the SDK. This largely entails updating
> > the
> > > > > > > release scripts to also build and publish the Go SDK Docker
> > containers.
> > > > > > > > >> > >> As for releasing the code, we're technically already
> > doing so
> > > > > > > whenever we tag a release branch [4].
> > > > > > > > >> > >>
> > > > > > > > >> > >> The clearest signal to the Go community however will be
> > > > > > > migrating the SDK to use Go Modules for dependency version
> > control,
> > > > > > > > >> > >> which Daniel is planning on working on after his Kafka
> > task.
> > > > > > > This will put our repo infrastructure, SDK contributors, and
> > users
> > > > > > > > >> > >> on the same footing when it comes to dependency
> > management.
> > > > > It
> > > > > > > will remove the "+incompatible" tags one sees on the
> > > > > > > > >> > >> pkg.go.dev list at [4].
> > > > > > > > >> > >>
> > > > > > > > >> > >> I'm very happy to answer any questions you might have
> > about
> > > > > the
> > > > > > > SDK, and provide additional links as needed. I intentionally
> > avoided
> > > > > > > > >> > >> a link barrage in this email, as they can distract
> > from the
> > > > > > > point: The SDK is ready for folks to use it, we need to tell
> > them that
> > > > > they
> > > > > > > can
> > > > > > > > >> > >> rather than they shouldn't.
> > > > > > > > >> > >>
> > > > > > > > >> > >> Robert Burke
> > > > > > > > >> > >> Defacto Beam Go TL
> > > > > > > > >> > >>
> > > > > > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > > > > > >> > >> [1]
> > > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > > > > >> > >> [2]
> > https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > > > > >> > >> [3]
> > > > > > >
> > > > >
> > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > > (SDK Audit sheet)
> > > > > > > > >> > >> [4]
> > > > > > >
> > > > >
> > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > > > > >> >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> 

Re: [Proposal] Go SDK Exits Experimental

Posted by Ahmet Altay <al...@google.com>.
Thank you for the detailed update! Let us know if we can help.

On Wed, Oct 13, 2021 at 2:42 PM Robert Burke <lo...@apache.org> wrote:

> This is a status update.
>
> At this point 2.33.0 is released, but there are difficulties with
> accessing the tagged versions using the standard go tools. It's currently
> under investigation.
>
> Using the v2 path in a go program then running `go mod tidy` will populate
> the file with  a pseudo-version rather than the latest tag (v2.33.0)  (eg
> the line looks like
> require github.com/apache/beam/sdks/v2 v2.0.0-20211013181004-a9120e083008
> )
>
> While this will work, it's not the desired experience for users at this
> point. Current downside is that the releases are not meaningful targets for
> some reason. However, we retain the other benefits of Go Modules (actual
> dependency versioning, management by go tools).
>
> The issue is some combination of the go tooling [A] , that we added a go
> mod file outside of the repo root [B], and that we did not increment the
> major version (v2 -> v3) when adding the go mod file [C].
>
> [B] From the go documentation, this should be legal and fine, even if it's
> not recommended. This is fortunate because the root of the repo would have
> played poorly with root vendor directory, which the go tools have opinions
> on.
>
> [C] Incrementing the major version is recommended,in the Go Modules
> documentation, when transitioning to Go Modules. However, it never said it
> was required, nor did it indicate this current failure mode. If anything
> this should be documented in those docs, if it's not another bug. We would
> not necessarily want to declare a global v3 for beam at this time, for just
> the Go SDK, it would become confusing rather quickly. Notionally there are
> some larger breaking changes the Java and Python SDKs would want to make in
> such an event, and thus it's a larger conversation, that is out of scope at
> this time.
>
> This leaves [A] where some mis-understanding of the documented semantics
> occurred. I certainly expected the tagged version of the non-root go-module
> to be inherited from the parent, not wholesale ignored. As a result, I'll
> be filing a bug against the go tools to determine this, and see what paths
> forward exist.
>
> It's my hope to resolve this before we write a properly Experimental Exit
> blog post for the Go SDK.
>
> Thank you for your patience, and time.
> Robert Burke
> Beam Go Busybody
>
>
>
>
> On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote:
> > With 2.32 the LICENSE issue has been fixed [1], and the SDK now uses Go
> Modules for dependency management, simplifying Go SDK contributions. [2]
> >
> > The Module file lives in the sdks/ directory so there's a single Go
> Module for the whole SDK, tests, examples, and any support code for the
> container boot builds. This excludes the Go SDK Code katas [3] go modules
> which can be updated once 2.33.0 has been released.
> >
> > PR 15365 [4] adds the SDK containers back to the release builds, and
> default uses the release specific container for docker execution jobs. For
> at least the 2.33.0 release this does mean that  manual validation will
> need to explictly specify RC versions of containers. However, given that
> the Go SDK container and worker boot process rarely changes, this is
> unlikely to be an issue.
> >
> > At present I'm cleaning up some of the references to experimental, and
> making it clear that 2.33.0 is the first non-experimental release (even
> though that's 4-6 weeks out from actual release.) CHANGES.md  will be
> updated to note the event, but a larger blogpost will happen after the
> release goes public.
> >
> > Cheers,
> > Robert Burke
> > Defacto Beam Go TL.
> >
> > [1]
> https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
> > [2] https://github.com/apache/beam/pull/15323
> > [3] https://github.com/apache/beam/tree/master/learning/katas/go
> > [4] https://github.com/apache/beam/pull/15365
> >
> > On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote:
> > > +1, congratulations & thank you!
> > >
> > > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org>
> wrote:
> > >
> > > > Regarding documentation update: Initial PR is
> > > > https://github.com/apache/beam/pull/15057 which goes up to section
> ~4.3.
> > > > JIRA link for Programing Guide changes:
> > > > https://issues.apache.org/jira/browse/BEAM-12513
> > > >
> > > >
> > > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > > > > Yup!
> > > > >
> > > > > My immediate plan is to work on incorporating the Go SDK fully
> into the
> > > > > Beam Programming Guide. I've audited the guide, and
> > > > > am beginning to add missing content and filling in the Go specific
> gaps.
> > > > > This will be tied to improving the Go Doc with more Go
> > > > > specific user documentation that isn't appropriate for the BPG.
> > > > >
> > > > > My audit of the guide is here:
> > > > >
> > > >
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > >
> > > > > The other sheets focus on features and tests. The feature page
> looks
> > > > worse
> > > > > than it is, as it was more productive to focus on what isn't
> available
> > > > than
> > > > > what is. That's a snapshot of my actual working sheet but I'll be
> > > > updating
> > > > > it as needed.
> > > > >
> > > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com>
> wrote:
> > > > >
> > > > > > Oups forgot to write one question. Will this come with revamped
> > > > > > website instructions/doc for golang too?
> > > > > >
> > > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > Huge +1
> > > > > > >
> > > > > > > This is definitely something many people have asked about, so
> it is
> > > > > > > great to see it finally happening.
> > > > > > >
> > > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <
> kenn@apache.org>
> > > > wrote:
> > > > > > > >
> > > > > > > > +1 awesome
> > > > > > > >
> > > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <
> lostluck@apache.org
> > > > >
> > > > > > wrote:
> > > > > > > >>
> > > > > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> > > > modules
> > > > > > and LICENSE issue) done before the 2.32 cut, and certainly
> before the
> > > > 2.33
> > > > > > cut if release images aren't added to the 2.32 process.
> > > > > > > >>
> > > > > > > >> Regarding Go Generics: at some point in the future, we may
> want a
> > > > > > harder break between a newer Generic first API and and the
> current
> > > > version,
> > > > > > but there's no rush. Generics/TypeParameters in Go aren't
> identical to
> > > > the
> > > > > > feature referred to by that term in Java, C++, Rust, etc, so
> it'll
> > > > take a
> > > > > > bit of time for that expertise to develop.
> > > > > > > >>
> > > > > > > >> However, by the current nature of Go, we had to have pretty
> > > > > > sophisticated reflective analysis to handle DoFns and map them
> to their
> > > > > > graph inputs. So, adding new helpers like a KV, emitter, and
> Iterator
> > > > > > types, shouldn't be too difficult. Changing Go SDK internals to
> use
> > > > > > generics (like the implementation of Stats DoFns like Min, Max,
> etc)
> > > > would
> > > > > > also be able to be made transparently to most users, and
> certainly any
> > > > of
> > > > > > the framework for execution time handling (the "worker's SDK
> harness")
> > > > > > would be able to be cleaned up if need be. Finally, adding more
> > > > > > sophisticated DoFn registration and code generation would be
> able to
> > > > > > replace the optional code generator entirely, saving some users
> a `go
> > > > > > generate` step, simplifying getting improved execution
> performance.
> > > > > > > >>
> > > > > > > >> Changing things like making a Type Parameterized
> PCollection,
> > > > would
> > > > > > be far more involved, as would trying to use some kind of Apply
> > > > format. The
> > > > > > lack of Method Overrides prevents the apply chaining approach.
> Or at
> > > > least
> > > > > > prevents it from working simply.
> > > > > > > >>
> > > > > > > >> Finally, Go Generics won't be available until Go 1.18,
> which isn't
> > > > > > until next year. See https://blog.golang.org/generics-proposal
> for
> > > > > > details.
> > > > > > > >>
> > > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a
> Register
> > > > > > calling convention, leading to a modest performance improvement
> across
> > > > the
> > > > > > board.
> > > > > > > >>
> > > > > > > >> Cheers,
> > > > > > > >> Robert Burke
> > > > > > > >>
> > > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <
> robertwb@google.com>
> > > > wrote:
> > > > > > > >> > +1 to declaring Golang support out of experimental once
> the Go
> > > > > > Modules
> > > > > > > >> > issues are solved. I don't think an SDK needs to support
> every
> > > > > > feature
> > > > > > > >> > to be accepted, especially now that we can do
> cross-language
> > > > > > > >> > transforms, and Go definitely supports enough to be quite
> > > > useful.
> > > > > > (WRT
> > > > > > > >> > streaming, my understanding is that Go supports the
> streaming
> > > > model
> > > > > > > >> > with windows and timestamps, and runs fine on a streaming
> > > > runner,
> > > > > > even
> > > > > > > >> > if more advanced features like state and timers aren't yet
> > > > > > available.)
> > > > > > > >> >
> > > > > > > >> > This is a great milestone.
> > > > > > > >> >
> > > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > > > tysonjh@google.com>
> > > > > > wrote:
> > > > > > > >> > >
> > > > > > > >> > > WOW! Big news.
> > > > > > > >> > >
> > > > > > > >> > > I'm supportive of leaving experimental status after Go
> Modules
> > > > > > are completed and the LICENSE issue is resolved. I don't think
> that
> > > > lacking
> > > > > > streaming support is a blocker. The other thing I checked to see
> was if
> > > > > > there were metrics available on metrics.beam.apache.org,
> specifically
> > > > for
> > > > > > measuring code health via post-commit over time, which there are
> and
> > > > the
> > > > > > passing test rate is high (Huzzah!). The one thing that
> surprised me
> > > > from
> > > > > > your summary is that when Go introduces generics it won't result
> in any
> > > > > > backwards incompatible changes in Apache Beam. That's great
> news, but
> > > > does
> > > > > > it mean there will be a need to support both non-generic and
> generic
> > > > APIs
> > > > > > moving forward? It seems like generics will be introduced in the
> Go
> > > > 1.17
> > > > > > release (optimistically) in August this year.
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > >
> > > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > > > lostluck@apache.org>
> > > > > > wrote:
> > > > > > > >> > >>
> > > > > > > >> > >> Hello Beam Community!
> > > > > > > >> > >>
> > > > > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> > > > experimental.
> > > > > > > >> > >>
> > > > > > > >> > >> This thread is to discuss it as a community, and any
> > > > conditions
> > > > > > that remain that would prevent the exit.
> > > > > > > >> > >>
> > > > > > > >> > >> tl;dr;
> > > > > > > >> > >> Ask Questions for answers and links! I have both.
> > > > > > > >> > >> This entails including it officially in the Release
> process,
> > > > > > removing the various "experimental" text throughout the repo etc,
> > > > > > > >> > >> and otherwise treating it like Python and Java. Some Go
> > > > specific
> > > > > > tasks around dep versioning.
> > > > > > > >> > >>
> > > > > > > >> > >> The Go SDK implements the beam model efficiently for
> most
> > > > batch
> > > > > > tasks, including basic windowing.
> > > > > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> > > > Portable
> > > > > > runners.
> > > > > > > >> > >> The core APIs are not going to change in incompatible
> ways
> > > > going
> > > > > > forward.
> > > > > > > >> > >> Scalable transforms can be written through
> SplittableDoFns or
> > > > > > via Cross Language transforms.
> > > > > > > >> > >>
> > > > > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> > > > experimental
> > > > > > doesn't help with that any further.
> > > > > > > >> > >> Communities grow through contributions and use, and
> > > > experimental
> > > > > > markers dissuade users.
> > > > > > > >> > >> There's plenty to do in order expand what can be done
> with
> > > > the
> > > > > > SDK. (Contributions welcome)
> > > > > > > >> > >>
> > > > > > > >> > >> Why Exit Experimental now?
> > > > > > > >> > >>
> > > > > > > >> > >> Typically when we call an SDK or API Experimental, it's
> > > > because
> > > > > > there's a risk that API or behaviors may change significantly.
> > > > > > > >> > >> This in turn, leads to additional work for users of
> the SDK
> > > > on
> > > > > > every release which leads to sticking to older versions or
> forking
> > > > > > > >> > >> to preserve behavior. Version updates should be looked
> > > > forward
> > > > > > to, and viewed as having little risk. Further while there's been
> > > > > > > >> > >> previous dicussion about what the "low bar" is for a
> new
> > > > SDK, it
> > > > > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > > > > >> > >> hurt development and contribution of new SDK languages
> > > > (inherent
> > > > > > difficulty of SDK development notwithstanding).
> > > > > > > >> > >>
> > > > > > > >> > >> When the SDK was designed, it wasn't entirely clear
> what the
> > > > > > Beam Model should look like in an opinionated language like Go.
> > > > > > > >> > >> Their initial take (see
> > > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into
> detail
> > > > what it
> > > > > > means for a language without
> > > > > > > >> > >> Generics, or overloading, or inheritance to implement
> the
> > > > beam
> > > > > > model. One could largely throw away static types (like Python),
> > > > > > > >> > >> but this approach rings hollow for Go. It would not do
> if the
> > > > > > approach couldn't grow and scale to the Beam Model. It's also
> hard
> > > > > > > >> > >> to tell if an API is any good before there are users.
> > > > > > > >> > >>
> > > > > > > >> > >> Further, in the early days of Portability, there
> wasn't a
> > > > way to
> > > > > > write scalable DoFns, dynamically or otherwise. It's an
> incredible
> > > > > > > >> > >> bottleneck to need to do all initial fanout of work on
> a
> > > > single
> > > > > > machine, write everything to a Reshuffle, just in order to scale
> up.
> > > > > > > >> > >> Without being able to scale, Beam is little more than
> > > > overhead.
> > > > > > > >> > >>
> > > > > > > >> > >> At this point, both of these needs are met within the
> Go SDK
> > > > for
> > > > > > open source.
> > > > > > > >> > >>
> > > > > > > >> > >> Background
> > > > > > > >> > >>
> > > > > > > >> > >> The Go SDK has been a part of the beam repo for a few
> years
> > > > now,
> > > > > > since it was accidentally merged into master.
> > > > > > > >> > >> Since then it's been called experimental, and not
> officially
> > > > > > part of the releases.
> > > > > > > >> > >>
> > > > > > > >> > >> Of the SDKs, it's was always designed around Beam
> Portability
> > > > > > first. It never had any "Legacy" (SDK x Runner specific )
> workers.
> > > > > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> > > > execute
> > > > > > jobs, first with some very experimental code on Dataflow, but now
> > > > > > > >> > >> on all portable supported runners, like Flink, Spark,
> the
> > > > Python
> > > > > > Portable runner, and Dataflow.
> > > > > > > >> > >>
> > > > > > > >> > >> API Stability
> > > > > > > >> > >>
> > > > > > > >> > >> The Go SDK hasn't meaningfully changed it's user API
> for DoFn
> > > > > > and pipeline construction since it was first merged in, and
> there are
> > > > no
> > > > > > > >> > >> changes to that on the horizon that can't be made in a
> > > > backwards
> > > > > > compatible manner. Largely these are related to New Features, or
> > > > > > > >> > >> usability improvements enabled by the advent of Go
> Generics
> > > > > > (think of "real" KV, emitter, and iterator types).
> > > > > > > >> > >>
> > > > > > > >> > >> It's an open secret that the Go SDK has largely been
> under
> > > > work
> > > > > > for use within Google. It's use is called FlumeGo, representing
> > > > > > > >> > >> the Apache Beam Go SDK, running on top of Flume,
> Google's
> > > > batch
> > > > > > pipeline processing engine. Thus most of the focus on improving
> > > > > > > >> > >> batch execution. FlumeGo sees ample use today, and
> there
> > > > hasn't
> > > > > > been a call for fundamental changes to the API for ergonomic or
> > > > > > > >> > >> usability concerns.
> > > > > > > >> > >>
> > > > > > > >> > >> Scalability
> > > > > > > >> > >>
> > > > > > > >> > >> Google could get away without the Go SDK having an SDK
> side
> > > > > > scalability solution as a result of it's integration with Flume.
> > > > > > > >> > >> However, those days are now past.
> > > > > > > >> > >>
> > > > > > > >> > >> The Go SDK now supports SplittableDoFns along with
> Dynamic
> > > > > > Splitting, which supports writing scalable batch transforms
> natively
> > > > > > > >> > >> in the Go SDK.
> > > > > > > >> > >> The SDK also supports Cross Language Transforms, with
> Beam
> > > > > > Schema encodings. With it, production hardened transforms
> > > > > > > >> > >> from Java and Python are a wrapper away.
> > > > > > > >> > >>
> > > > > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF
> side
> > > > work,
> > > > > > and completed the Xlang work,) is adding a wrapper for the
> > > > > > > >> > >> Java Kafka IO using Cross Language Transforms, which
> is often
> > > > > > been requested. This will also enable use of the Beam SQL
> > > > > > > >> > >> transforms that java enables.
> > > > > > > >> > >>
> > > > > > > >> > >> Features
> > > > > > > >> > >>
> > > > > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK
> implements
> > > > > > standard coders, allows for user DoFns, and CombineFns and access
> > > > > > > >> > >> to core transforms like Flatten, GroupByKey, and
> features
> > > > like
> > > > > > Side Inputs, Windowing, and User Metrics.
> > > > > > > >> > >> Basic windowing will be fully supported for batch even
> > > > through
> > > > > > lifted combines in the 2.32.0 release.
> > > > > > > >> > >>
> > > > > > > >> > >> All of the above enables Beam Go to be versatile for
> batch
> > > > > > execution on portable runners, and for simple streaming
> pipelines.
> > > > > > > >> > >>
> > > > > > > >> > >> Repo Testing
> > > > > > > >> > >>
> > > > > > > >> > >> On precommit the Go SDK runs all it's unit tests. On
> top of
> > > > > > that, it runs all it's integration tests against the Python
> Portable
> > > > runner,
> > > > > > > >> > >> making it quick and robust to detect breaking changes
> without
> > > > > > overspending community resources. Those same tests are also
> > > > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > > > >> > >>
> > > > > > > >> > >> The tests are executable against all runners via the
> > > > appropriate
> > > > > > Go commands (if you've stood up your own job management server),
> > > > > > > >> > >> or Gradle commands (which will spin up runner
> instances for
> > > > > > you). Documentation for executing tests and adding new ones
> > > > > > > >> > >> is on the wiki. [2] They are accessible to Go
> developers as
> > > > > > they're implemented with the standard Go testing tools.
> > > > > > > >> > >>
> > > > > > > >> > >> Shortcomings
> > > > > > > >> > >> That said, there's still much to do. Let me briefly
> tell you
> > > > > > what doesn't work, and it's up to you to weigh whether they block
> > > > > > > >> > >> being out of experimental.
> > > > > > > >> > >>
> > > > > > > >> > >> At present, only a textio has been implemented as
> Splittable
> > > > > > DoFn.
> > > > > > > >> > >> Once the Kafka wrapper is merged in, it will serve as
> a the
> > > > > > first example for future contributions for
> > > > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > > > >> > >> Transforms and IOs are lacking, but at this point
> users are
> > > > > > empowered to write their own DoFns or wrap existing transforms
> for
> > > > Cross
> > > > > > Language use.
> > > > > > > >> > >>
> > > > > > > >> > >> In the core SDK, more streaming focused features have
> yet to
> > > > be
> > > > > > implemented, but they're largely additions to what exists already
> > > > > > > >> > >> rather than total rebuilds. Much of the work is
> definining
> > > > how a
> > > > > > user specifies their desires, and turning those into the
> appropriate
> > > > > > > >> > >> FnAPI requests at execution time. Back in October I
> wrote at
> > > > > > length on the wiki [1] what's missing for additional streaming
> > > > features.
> > > > > > > >> > >>
> > > > > > > >> > >> While we have bolstered our testing recently, there's
> likely
> > > > > > still more we could test to improve our confidence in the SDK,
> > > > > > > >> > >> in particular regarding the included transforms
> libraries and
> > > > > > examples.
> > > > > > > >> > >>
> > > > > > > >> > >> Moving Forward
> > > > > > > >> > >>
> > > > > > > >> > >> My immediate plan is to work on incorporating the Go
> SDK
> > > > fully
> > > > > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > > > > >> > >> am beginning to add missing content and filling in the
> Go
> > > > > > specific gaps. This will be tied to improving the Go Doc with
> more Go
> > > > > > > >> > >> specific user documentation that isn't appropriate for
> the
> > > > BPG.
> > > > > > > >> > >> And resolving the LICENSE issue around the public
> display of
> > > > > > that GoDoc.
> > > > > > > >> > >>
> > > > > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > > > > incorporate the SDK into the release process, and remove the
> > > > "experimental"
> > > > > > > >> > >> language around the SDK. This largely entails updating
> the
> > > > > > release scripts to also build and publish the Go SDK Docker
> containers.
> > > > > > > >> > >> As for releasing the code, we're technically already
> doing so
> > > > > > whenever we tag a release branch [4].
> > > > > > > >> > >>
> > > > > > > >> > >> The clearest signal to the Go community however will be
> > > > > > migrating the SDK to use Go Modules for dependency version
> control,
> > > > > > > >> > >> which Daniel is planning on working on after his Kafka
> task.
> > > > > > This will put our repo infrastructure, SDK contributors, and
> users
> > > > > > > >> > >> on the same footing when it comes to dependency
> management.
> > > > It
> > > > > > will remove the "+incompatible" tags one sees on the
> > > > > > > >> > >> pkg.go.dev list at [4].
> > > > > > > >> > >>
> > > > > > > >> > >> I'm very happy to answer any questions you might have
> about
> > > > the
> > > > > > SDK, and provide additional links as needed. I intentionally
> avoided
> > > > > > > >> > >> a link barrage in this email, as they can distract
> from the
> > > > > > point: The SDK is ready for folks to use it, we need to tell
> them that
> > > > they
> > > > > > can
> > > > > > > >> > >> rather than they shouldn't.
> > > > > > > >> > >>
> > > > > > > >> > >> Robert Burke
> > > > > > > >> > >> Defacto Beam Go TL
> > > > > > > >> > >>
> > > > > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > > > > >> > >> [1]
> > > > > >
> > > >
> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > > > >> > >> [2]
> https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > > > >> > >> [3]
> > > > > >
> > > >
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > > (SDK Audit sheet)
> > > > > > > >> > >> [4]
> > > > > >
> > > >
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > > > >> >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <lo...@apache.org>.
This is a status update.

At this point 2.33.0 is released, but there are difficulties with accessing the tagged versions using the standard go tools. It's currently under investigation.

Using the v2 path in a go program then running `go mod tidy` will populate the file with  a pseudo-version rather than the latest tag (v2.33.0)  (eg the line looks like 
require github.com/apache/beam/sdks/v2 v2.0.0-20211013181004-a9120e083008 )

While this will work, it's not the desired experience for users at this point. Current downside is that the releases are not meaningful targets for some reason. However, we retain the other benefits of Go Modules (actual dependency versioning, management by go tools).

The issue is some combination of the go tooling [A] , that we added a go mod file outside of the repo root [B], and that we did not increment the major version (v2 -> v3) when adding the go mod file [C].

[B] From the go documentation, this should be legal and fine, even if it's not recommended. This is fortunate because the root of the repo would have played poorly with root vendor directory, which the go tools have opinions on.

[C] Incrementing the major version is recommended,in the Go Modules documentation, when transitioning to Go Modules. However, it never said it was required, nor did it indicate this current failure mode. If anything this should be documented in those docs, if it's not another bug. We would not necessarily want to declare a global v3 for beam at this time, for just the Go SDK, it would become confusing rather quickly. Notionally there are some larger breaking changes the Java and Python SDKs would want to make in such an event, and thus it's a larger conversation, that is out of scope at this time.

This leaves [A] where some mis-understanding of the documented semantics occurred. I certainly expected the tagged version of the non-root go-module to be inherited from the parent, not wholesale ignored. As a result, I'll be filing a bug against the go tools to determine this, and see what paths forward exist.

It's my hope to resolve this before we write a properly Experimental Exit blog post for the Go SDK.

Thank you for your patience, and time.
Robert Burke
Beam Go Busybody




On 2021/08/23 18:12:00, Robert Burke <lo...@apache.org> wrote: 
> With 2.32 the LICENSE issue has been fixed [1], and the SDK now uses Go Modules for dependency management, simplifying Go SDK contributions. [2]
> 
> The Module file lives in the sdks/ directory so there's a single Go Module for the whole SDK, tests, examples, and any support code for the container boot builds. This excludes the Go SDK Code katas [3] go modules which can be updated once 2.33.0 has been released.
> 
> PR 15365 [4] adds the SDK containers back to the release builds, and default uses the release specific container for docker execution jobs. For at least the 2.33.0 release this does mean that  manual validation will need to explictly specify RC versions of containers. However, given that the Go SDK container and worker boot process rarely changes, this is unlikely to be an issue.
> 
> At present I'm cleaning up some of the references to experimental, and making it clear that 2.33.0 is the first non-experimental release (even though that's 4-6 weeks out from actual release.) CHANGES.md  will be updated to note the event, but a larger blogpost will happen after the release goes public.
> 
> Cheers,
> Robert Burke
> Defacto Beam Go TL.
> 
> [1] https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
> [2] https://github.com/apache/beam/pull/15323
> [3] https://github.com/apache/beam/tree/master/learning/katas/go
> [4] https://github.com/apache/beam/pull/15365
> 
> On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote: 
> > +1, congratulations & thank you!
> > 
> > On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org> wrote:
> > 
> > > Regarding documentation update: Initial PR is
> > > https://github.com/apache/beam/pull/15057 which goes up to section ~4.3.
> > > JIRA link for Programing Guide changes:
> > > https://issues.apache.org/jira/browse/BEAM-12513
> > >
> > >
> > > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > > > Yup!
> > > >
> > > > My immediate plan is to work on incorporating the Go SDK fully into the
> > > > Beam Programming Guide. I've audited the guide, and
> > > > am beginning to add missing content and filling in the Go specific gaps.
> > > > This will be tied to improving the Go Doc with more Go
> > > > specific user documentation that isn't appropriate for the BPG.
> > > >
> > > > My audit of the guide is here:
> > > >
> > > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > >
> > > > The other sheets focus on features and tests. The feature page looks
> > > worse
> > > > than it is, as it was more productive to focus on what isn't available
> > > than
> > > > what is. That's a snapshot of my actual working sheet but I'll be
> > > updating
> > > > it as needed.
> > > >
> > > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com> wrote:
> > > >
> > > > > Oups forgot to write one question. Will this come with revamped
> > > > > website instructions/doc for golang too?
> > > > >
> > > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > Huge +1
> > > > > >
> > > > > > This is definitely something many people have asked about, so it is
> > > > > > great to see it finally happening.
> > > > > >
> > > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > +1 awesome
> > > > > > >
> > > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lostluck@apache.org
> > > >
> > > > > wrote:
> > > > > > >>
> > > > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> > > modules
> > > > > and LICENSE issue) done before the 2.32 cut, and certainly before the
> > > 2.33
> > > > > cut if release images aren't added to the 2.32 process.
> > > > > > >>
> > > > > > >> Regarding Go Generics: at some point in the future, we may want a
> > > > > harder break between a newer Generic first API and and the current
> > > version,
> > > > > but there's no rush. Generics/TypeParameters in Go aren't identical to
> > > the
> > > > > feature referred to by that term in Java, C++, Rust, etc, so it'll
> > > take a
> > > > > bit of time for that expertise to develop.
> > > > > > >>
> > > > > > >> However, by the current nature of Go, we had to have pretty
> > > > > sophisticated reflective analysis to handle DoFns and map them to their
> > > > > graph inputs. So, adding new helpers like a KV, emitter, and Iterator
> > > > > types, shouldn't be too difficult. Changing Go SDK internals to use
> > > > > generics (like the implementation of Stats DoFns like Min, Max, etc)
> > > would
> > > > > also be able to be made transparently to most users, and certainly any
> > > of
> > > > > the framework for execution time handling (the "worker's SDK harness")
> > > > > would be able to be cleaned up if need be. Finally, adding more
> > > > > sophisticated DoFn registration and code generation would be able to
> > > > > replace the optional code generator entirely, saving some users a `go
> > > > > generate` step, simplifying getting improved execution performance.
> > > > > > >>
> > > > > > >> Changing things like making a Type Parameterized PCollection,
> > > would
> > > > > be far more involved, as would trying to use some kind of Apply
> > > format. The
> > > > > lack of Method Overrides prevents the apply chaining approach. Or at
> > > least
> > > > > prevents it from working simply.
> > > > > > >>
> > > > > > >> Finally, Go Generics won't be available until Go 1.18, which isn't
> > > > > until next year. See https://blog.golang.org/generics-proposal for
> > > > > details.
> > > > > > >>
> > > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register
> > > > > calling convention, leading to a modest performance improvement across
> > > the
> > > > > board.
> > > > > > >>
> > > > > > >> Cheers,
> > > > > > >> Robert Burke
> > > > > > >>
> > > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com>
> > > wrote:
> > > > > > >> > +1 to declaring Golang support out of experimental once the Go
> > > > > Modules
> > > > > > >> > issues are solved. I don't think an SDK needs to support every
> > > > > feature
> > > > > > >> > to be accepted, especially now that we can do cross-language
> > > > > > >> > transforms, and Go definitely supports enough to be quite
> > > useful.
> > > > > (WRT
> > > > > > >> > streaming, my understanding is that Go supports the streaming
> > > model
> > > > > > >> > with windows and timestamps, and runs fine on a streaming
> > > runner,
> > > > > even
> > > > > > >> > if more advanced features like state and timers aren't yet
> > > > > available.)
> > > > > > >> >
> > > > > > >> > This is a great milestone.
> > > > > > >> >
> > > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > > tysonjh@google.com>
> > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > WOW! Big news.
> > > > > > >> > >
> > > > > > >> > > I'm supportive of leaving experimental status after Go Modules
> > > > > are completed and the LICENSE issue is resolved. I don't think that
> > > lacking
> > > > > streaming support is a blocker. The other thing I checked to see was if
> > > > > there were metrics available on metrics.beam.apache.org, specifically
> > > for
> > > > > measuring code health via post-commit over time, which there are and
> > > the
> > > > > passing test rate is high (Huzzah!). The one thing that surprised me
> > > from
> > > > > your summary is that when Go introduces generics it won't result in any
> > > > > backwards incompatible changes in Apache Beam. That's great news, but
> > > does
> > > > > it mean there will be a need to support both non-generic and generic
> > > APIs
> > > > > moving forward? It seems like generics will be introduced in the Go
> > > 1.17
> > > > > release (optimistically) in August this year.
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > > lostluck@apache.org>
> > > > > wrote:
> > > > > > >> > >>
> > > > > > >> > >> Hello Beam Community!
> > > > > > >> > >>
> > > > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> > > experimental.
> > > > > > >> > >>
> > > > > > >> > >> This thread is to discuss it as a community, and any
> > > conditions
> > > > > that remain that would prevent the exit.
> > > > > > >> > >>
> > > > > > >> > >> tl;dr;
> > > > > > >> > >> Ask Questions for answers and links! I have both.
> > > > > > >> > >> This entails including it officially in the Release process,
> > > > > removing the various "experimental" text throughout the repo etc,
> > > > > > >> > >> and otherwise treating it like Python and Java. Some Go
> > > specific
> > > > > tasks around dep versioning.
> > > > > > >> > >>
> > > > > > >> > >> The Go SDK implements the beam model efficiently for most
> > > batch
> > > > > tasks, including basic windowing.
> > > > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> > > Portable
> > > > > runners.
> > > > > > >> > >> The core APIs are not going to change in incompatible ways
> > > going
> > > > > forward.
> > > > > > >> > >> Scalable transforms can be written through SplittableDoFns or
> > > > > via Cross Language transforms.
> > > > > > >> > >>
> > > > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> > > experimental
> > > > > doesn't help with that any further.
> > > > > > >> > >> Communities grow through contributions and use, and
> > > experimental
> > > > > markers dissuade users.
> > > > > > >> > >> There's plenty to do in order expand what can be done with
> > > the
> > > > > SDK. (Contributions welcome)
> > > > > > >> > >>
> > > > > > >> > >> Why Exit Experimental now?
> > > > > > >> > >>
> > > > > > >> > >> Typically when we call an SDK or API Experimental, it's
> > > because
> > > > > there's a risk that API or behaviors may change significantly.
> > > > > > >> > >> This in turn, leads to additional work for users of the SDK
> > > on
> > > > > every release which leads to sticking to older versions or forking
> > > > > > >> > >> to preserve behavior. Version updates should be looked
> > > forward
> > > > > to, and viewed as having little risk. Further while there's been
> > > > > > >> > >> previous dicussion about what the "low bar" is for a new
> > > SDK, it
> > > > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > > > >> > >> hurt development and contribution of new SDK languages
> > > (inherent
> > > > > difficulty of SDK development notwithstanding).
> > > > > > >> > >>
> > > > > > >> > >> When the SDK was designed, it wasn't entirely clear what the
> > > > > Beam Model should look like in an opinionated language like Go.
> > > > > > >> > >> Their initial take (see
> > > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail
> > > what it
> > > > > means for a language without
> > > > > > >> > >> Generics, or overloading, or inheritance to implement the
> > > beam
> > > > > model. One could largely throw away static types (like Python),
> > > > > > >> > >> but this approach rings hollow for Go. It would not do if the
> > > > > approach couldn't grow and scale to the Beam Model. It's also hard
> > > > > > >> > >> to tell if an API is any good before there are users.
> > > > > > >> > >>
> > > > > > >> > >> Further, in the early days of Portability, there wasn't a
> > > way to
> > > > > write scalable DoFns, dynamically or otherwise. It's an incredible
> > > > > > >> > >> bottleneck to need to do all initial fanout of work on a
> > > single
> > > > > machine, write everything to a Reshuffle, just in order to scale up.
> > > > > > >> > >> Without being able to scale, Beam is little more than
> > > overhead.
> > > > > > >> > >>
> > > > > > >> > >> At this point, both of these needs are met within the Go SDK
> > > for
> > > > > open source.
> > > > > > >> > >>
> > > > > > >> > >> Background
> > > > > > >> > >>
> > > > > > >> > >> The Go SDK has been a part of the beam repo for a few years
> > > now,
> > > > > since it was accidentally merged into master.
> > > > > > >> > >> Since then it's been called experimental, and not officially
> > > > > part of the releases.
> > > > > > >> > >>
> > > > > > >> > >> Of the SDKs, it's was always designed around Beam Portability
> > > > > first. It never had any "Legacy" (SDK x Runner specific ) workers.
> > > > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> > > execute
> > > > > jobs, first with some very experimental code on Dataflow, but now
> > > > > > >> > >> on all portable supported runners, like Flink, Spark, the
> > > Python
> > > > > Portable runner, and Dataflow.
> > > > > > >> > >>
> > > > > > >> > >> API Stability
> > > > > > >> > >>
> > > > > > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn
> > > > > and pipeline construction since it was first merged in, and there are
> > > no
> > > > > > >> > >> changes to that on the horizon that can't be made in a
> > > backwards
> > > > > compatible manner. Largely these are related to New Features, or
> > > > > > >> > >> usability improvements enabled by the advent of Go Generics
> > > > > (think of "real" KV, emitter, and iterator types).
> > > > > > >> > >>
> > > > > > >> > >> It's an open secret that the Go SDK has largely been under
> > > work
> > > > > for use within Google. It's use is called FlumeGo, representing
> > > > > > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's
> > > batch
> > > > > pipeline processing engine. Thus most of the focus on improving
> > > > > > >> > >> batch execution. FlumeGo sees ample use today, and there
> > > hasn't
> > > > > been a call for fundamental changes to the API for ergonomic or
> > > > > > >> > >> usability concerns.
> > > > > > >> > >>
> > > > > > >> > >> Scalability
> > > > > > >> > >>
> > > > > > >> > >> Google could get away without the Go SDK having an SDK side
> > > > > scalability solution as a result of it's integration with Flume.
> > > > > > >> > >> However, those days are now past.
> > > > > > >> > >>
> > > > > > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic
> > > > > Splitting, which supports writing scalable batch transforms natively
> > > > > > >> > >> in the Go SDK.
> > > > > > >> > >> The SDK also supports Cross Language Transforms, with Beam
> > > > > Schema encodings. With it, production hardened transforms
> > > > > > >> > >> from Java and Python are a wrapper away.
> > > > > > >> > >>
> > > > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF side
> > > work,
> > > > > and completed the Xlang work,) is adding a wrapper for the
> > > > > > >> > >> Java Kafka IO using Cross Language Transforms, which is often
> > > > > been requested. This will also enable use of the Beam SQL
> > > > > > >> > >> transforms that java enables.
> > > > > > >> > >>
> > > > > > >> > >> Features
> > > > > > >> > >>
> > > > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements
> > > > > standard coders, allows for user DoFns, and CombineFns and access
> > > > > > >> > >> to core transforms like Flatten, GroupByKey, and features
> > > like
> > > > > Side Inputs, Windowing, and User Metrics.
> > > > > > >> > >> Basic windowing will be fully supported for batch even
> > > through
> > > > > lifted combines in the 2.32.0 release.
> > > > > > >> > >>
> > > > > > >> > >> All of the above enables Beam Go to be versatile for batch
> > > > > execution on portable runners, and for simple streaming pipelines.
> > > > > > >> > >>
> > > > > > >> > >> Repo Testing
> > > > > > >> > >>
> > > > > > >> > >> On precommit the Go SDK runs all it's unit tests. On top of
> > > > > that, it runs all it's integration tests against the Python Portable
> > > runner,
> > > > > > >> > >> making it quick and robust to detect breaking changes without
> > > > > overspending community resources. Those same tests are also
> > > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > > >> > >>
> > > > > > >> > >> The tests are executable against all runners via the
> > > appropriate
> > > > > Go commands (if you've stood up your own job management server),
> > > > > > >> > >> or Gradle commands (which will spin up runner instances for
> > > > > you). Documentation for executing tests and adding new ones
> > > > > > >> > >> is on the wiki. [2] They are accessible to Go developers as
> > > > > they're implemented with the standard Go testing tools.
> > > > > > >> > >>
> > > > > > >> > >> Shortcomings
> > > > > > >> > >> That said, there's still much to do. Let me briefly tell you
> > > > > what doesn't work, and it's up to you to weigh whether they block
> > > > > > >> > >> being out of experimental.
> > > > > > >> > >>
> > > > > > >> > >> At present, only a textio has been implemented as Splittable
> > > > > DoFn.
> > > > > > >> > >> Once the Kafka wrapper is merged in, it will serve as a the
> > > > > first example for future contributions for
> > > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > > >> > >> Transforms and IOs are lacking, but at this point users are
> > > > > empowered to write their own DoFns or wrap existing transforms for
> > > Cross
> > > > > Language use.
> > > > > > >> > >>
> > > > > > >> > >> In the core SDK, more streaming focused features have yet to
> > > be
> > > > > implemented, but they're largely additions to what exists already
> > > > > > >> > >> rather than total rebuilds. Much of the work is definining
> > > how a
> > > > > user specifies their desires, and turning those into the appropriate
> > > > > > >> > >> FnAPI requests at execution time. Back in October I wrote at
> > > > > length on the wiki [1] what's missing for additional streaming
> > > features.
> > > > > > >> > >>
> > > > > > >> > >> While we have bolstered our testing recently, there's likely
> > > > > still more we could test to improve our confidence in the SDK,
> > > > > > >> > >> in particular regarding the included transforms libraries and
> > > > > examples.
> > > > > > >> > >>
> > > > > > >> > >> Moving Forward
> > > > > > >> > >>
> > > > > > >> > >> My immediate plan is to work on incorporating the Go SDK
> > > fully
> > > > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > > > >> > >> am beginning to add missing content and filling in the Go
> > > > > specific gaps. This will be tied to improving the Go Doc with more Go
> > > > > > >> > >> specific user documentation that isn't appropriate for the
> > > BPG.
> > > > > > >> > >> And resolving the LICENSE issue around the public display of
> > > > > that GoDoc.
> > > > > > >> > >>
> > > > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > > > incorporate the SDK into the release process, and remove the
> > > "experimental"
> > > > > > >> > >> language around the SDK. This largely entails updating the
> > > > > release scripts to also build and publish the Go SDK Docker containers.
> > > > > > >> > >> As for releasing the code, we're technically already doing so
> > > > > whenever we tag a release branch [4].
> > > > > > >> > >>
> > > > > > >> > >> The clearest signal to the Go community however will be
> > > > > migrating the SDK to use Go Modules for dependency version control,
> > > > > > >> > >> which Daniel is planning on working on after his Kafka task.
> > > > > This will put our repo infrastructure, SDK contributors, and users
> > > > > > >> > >> on the same footing when it comes to dependency management.
> > > It
> > > > > will remove the "+incompatible" tags one sees on the
> > > > > > >> > >> pkg.go.dev list at [4].
> > > > > > >> > >>
> > > > > > >> > >> I'm very happy to answer any questions you might have about
> > > the
> > > > > SDK, and provide additional links as needed. I intentionally avoided
> > > > > > >> > >> a link barrage in this email, as they can distract from the
> > > > > point: The SDK is ready for folks to use it, we need to tell them that
> > > they
> > > > > can
> > > > > > >> > >> rather than they shouldn't.
> > > > > > >> > >>
> > > > > > >> > >> Robert Burke
> > > > > > >> > >> Defacto Beam Go TL
> > > > > > >> > >>
> > > > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > > > >> > >> [1]
> > > > >
> > > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > > >> > >> [3]
> > > > >
> > > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > > (SDK Audit sheet)
> > > > > > >> > >> [4]
> > > > >
> > > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > > >> >
> > > > >
> > > >
> > >
> > 
> 

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <lo...@apache.org>.
With 2.32 the LICENSE issue has been fixed [1], and the SDK now uses Go Modules for dependency management, simplifying Go SDK contributions. [2]

The Module file lives in the sdks/ directory so there's a single Go Module for the whole SDK, tests, examples, and any support code for the container boot builds. This excludes the Go SDK Code katas [3] go modules which can be updated once 2.33.0 has been released.

PR 15365 [4] adds the SDK containers back to the release builds, and default uses the release specific container for docker execution jobs. For at least the 2.33.0 release this does mean that  manual validation will need to explictly specify RC versions of containers. However, given that the Go SDK container and worker boot process rarely changes, this is unlikely to be an issue.

At present I'm cleaning up some of the references to experimental, and making it clear that 2.33.0 is the first non-experimental release (even though that's 4-6 weeks out from actual release.) CHANGES.md  will be updated to note the event, but a larger blogpost will happen after the release goes public.

Cheers,
Robert Burke
Defacto Beam Go TL.

[1] https://pkg.go.dev/github.com/apache/beam@v2.32.0-RC1+incompatible/sdks/go/pkg/beam
[2] https://github.com/apache/beam/pull/15323
[3] https://github.com/apache/beam/tree/master/learning/katas/go
[4] https://github.com/apache/beam/pull/15365

On 2021/06/28 23:12:19, Ahmet Altay <al...@google.com> wrote: 
> +1, congratulations & thank you!
> 
> On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org> wrote:
> 
> > Regarding documentation update: Initial PR is
> > https://github.com/apache/beam/pull/15057 which goes up to section ~4.3.
> > JIRA link for Programing Guide changes:
> > https://issues.apache.org/jira/browse/BEAM-12513
> >
> >
> > On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > > Yup!
> > >
> > > My immediate plan is to work on incorporating the Go SDK fully into the
> > > Beam Programming Guide. I've audited the guide, and
> > > am beginning to add missing content and filling in the Go specific gaps.
> > > This will be tied to improving the Go Doc with more Go
> > > specific user documentation that isn't appropriate for the BPG.
> > >
> > > My audit of the guide is here:
> > >
> > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > >
> > > The other sheets focus on features and tests. The feature page looks
> > worse
> > > than it is, as it was more productive to focus on what isn't available
> > than
> > > what is. That's a snapshot of my actual working sheet but I'll be
> > updating
> > > it as needed.
> > >
> > > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com> wrote:
> > >
> > > > Oups forgot to write one question. Will this come with revamped
> > > > website instructions/doc for golang too?
> > > >
> > > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> > wrote:
> > > > >
> > > > > Huge +1
> > > > >
> > > > > This is definitely something many people have asked about, so it is
> > > > > great to see it finally happening.
> > > > >
> > > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org>
> > wrote:
> > > > > >
> > > > > > +1 awesome
> > > > > >
> > > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lostluck@apache.org
> > >
> > > > wrote:
> > > > > >>
> > > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> > modules
> > > > and LICENSE issue) done before the 2.32 cut, and certainly before the
> > 2.33
> > > > cut if release images aren't added to the 2.32 process.
> > > > > >>
> > > > > >> Regarding Go Generics: at some point in the future, we may want a
> > > > harder break between a newer Generic first API and and the current
> > version,
> > > > but there's no rush. Generics/TypeParameters in Go aren't identical to
> > the
> > > > feature referred to by that term in Java, C++, Rust, etc, so it'll
> > take a
> > > > bit of time for that expertise to develop.
> > > > > >>
> > > > > >> However, by the current nature of Go, we had to have pretty
> > > > sophisticated reflective analysis to handle DoFns and map them to their
> > > > graph inputs. So, adding new helpers like a KV, emitter, and Iterator
> > > > types, shouldn't be too difficult. Changing Go SDK internals to use
> > > > generics (like the implementation of Stats DoFns like Min, Max, etc)
> > would
> > > > also be able to be made transparently to most users, and certainly any
> > of
> > > > the framework for execution time handling (the "worker's SDK harness")
> > > > would be able to be cleaned up if need be. Finally, adding more
> > > > sophisticated DoFn registration and code generation would be able to
> > > > replace the optional code generator entirely, saving some users a `go
> > > > generate` step, simplifying getting improved execution performance.
> > > > > >>
> > > > > >> Changing things like making a Type Parameterized PCollection,
> > would
> > > > be far more involved, as would trying to use some kind of Apply
> > format. The
> > > > lack of Method Overrides prevents the apply chaining approach. Or at
> > least
> > > > prevents it from working simply.
> > > > > >>
> > > > > >> Finally, Go Generics won't be available until Go 1.18, which isn't
> > > > until next year. See https://blog.golang.org/generics-proposal for
> > > > details.
> > > > > >>
> > > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register
> > > > calling convention, leading to a modest performance improvement across
> > the
> > > > board.
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Robert Burke
> > > > > >>
> > > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com>
> > wrote:
> > > > > >> > +1 to declaring Golang support out of experimental once the Go
> > > > Modules
> > > > > >> > issues are solved. I don't think an SDK needs to support every
> > > > feature
> > > > > >> > to be accepted, especially now that we can do cross-language
> > > > > >> > transforms, and Go definitely supports enough to be quite
> > useful.
> > > > (WRT
> > > > > >> > streaming, my understanding is that Go supports the streaming
> > model
> > > > > >> > with windows and timestamps, and runs fine on a streaming
> > runner,
> > > > even
> > > > > >> > if more advanced features like state and timers aren't yet
> > > > available.)
> > > > > >> >
> > > > > >> > This is a great milestone.
> > > > > >> >
> > > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> > tysonjh@google.com>
> > > > wrote:
> > > > > >> > >
> > > > > >> > > WOW! Big news.
> > > > > >> > >
> > > > > >> > > I'm supportive of leaving experimental status after Go Modules
> > > > are completed and the LICENSE issue is resolved. I don't think that
> > lacking
> > > > streaming support is a blocker. The other thing I checked to see was if
> > > > there were metrics available on metrics.beam.apache.org, specifically
> > for
> > > > measuring code health via post-commit over time, which there are and
> > the
> > > > passing test rate is high (Huzzah!). The one thing that surprised me
> > from
> > > > your summary is that when Go introduces generics it won't result in any
> > > > backwards incompatible changes in Apache Beam. That's great news, but
> > does
> > > > it mean there will be a need to support both non-generic and generic
> > APIs
> > > > moving forward? It seems like generics will be introduced in the Go
> > 1.17
> > > > release (optimistically) in August this year.
> > > > > >> > >
> > > > > >> > >
> > > > > >> > >
> > > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> > lostluck@apache.org>
> > > > wrote:
> > > > > >> > >>
> > > > > >> > >> Hello Beam Community!
> > > > > >> > >>
> > > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> > experimental.
> > > > > >> > >>
> > > > > >> > >> This thread is to discuss it as a community, and any
> > conditions
> > > > that remain that would prevent the exit.
> > > > > >> > >>
> > > > > >> > >> tl;dr;
> > > > > >> > >> Ask Questions for answers and links! I have both.
> > > > > >> > >> This entails including it officially in the Release process,
> > > > removing the various "experimental" text throughout the repo etc,
> > > > > >> > >> and otherwise treating it like Python and Java. Some Go
> > specific
> > > > tasks around dep versioning.
> > > > > >> > >>
> > > > > >> > >> The Go SDK implements the beam model efficiently for most
> > batch
> > > > tasks, including basic windowing.
> > > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> > Portable
> > > > runners.
> > > > > >> > >> The core APIs are not going to change in incompatible ways
> > going
> > > > forward.
> > > > > >> > >> Scalable transforms can be written through SplittableDoFns or
> > > > via Cross Language transforms.
> > > > > >> > >>
> > > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> > experimental
> > > > doesn't help with that any further.
> > > > > >> > >> Communities grow through contributions and use, and
> > experimental
> > > > markers dissuade users.
> > > > > >> > >> There's plenty to do in order expand what can be done with
> > the
> > > > SDK. (Contributions welcome)
> > > > > >> > >>
> > > > > >> > >> Why Exit Experimental now?
> > > > > >> > >>
> > > > > >> > >> Typically when we call an SDK or API Experimental, it's
> > because
> > > > there's a risk that API or behaviors may change significantly.
> > > > > >> > >> This in turn, leads to additional work for users of the SDK
> > on
> > > > every release which leads to sticking to older versions or forking
> > > > > >> > >> to preserve behavior. Version updates should be looked
> > forward
> > > > to, and viewed as having little risk. Further while there's been
> > > > > >> > >> previous dicussion about what the "low bar" is for a new
> > SDK, it
> > > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > > >> > >> hurt development and contribution of new SDK languages
> > (inherent
> > > > difficulty of SDK development notwithstanding).
> > > > > >> > >>
> > > > > >> > >> When the SDK was designed, it wasn't entirely clear what the
> > > > Beam Model should look like in an opinionated language like Go.
> > > > > >> > >> Their initial take (see
> > > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail
> > what it
> > > > means for a language without
> > > > > >> > >> Generics, or overloading, or inheritance to implement the
> > beam
> > > > model. One could largely throw away static types (like Python),
> > > > > >> > >> but this approach rings hollow for Go. It would not do if the
> > > > approach couldn't grow and scale to the Beam Model. It's also hard
> > > > > >> > >> to tell if an API is any good before there are users.
> > > > > >> > >>
> > > > > >> > >> Further, in the early days of Portability, there wasn't a
> > way to
> > > > write scalable DoFns, dynamically or otherwise. It's an incredible
> > > > > >> > >> bottleneck to need to do all initial fanout of work on a
> > single
> > > > machine, write everything to a Reshuffle, just in order to scale up.
> > > > > >> > >> Without being able to scale, Beam is little more than
> > overhead.
> > > > > >> > >>
> > > > > >> > >> At this point, both of these needs are met within the Go SDK
> > for
> > > > open source.
> > > > > >> > >>
> > > > > >> > >> Background
> > > > > >> > >>
> > > > > >> > >> The Go SDK has been a part of the beam repo for a few years
> > now,
> > > > since it was accidentally merged into master.
> > > > > >> > >> Since then it's been called experimental, and not officially
> > > > part of the releases.
> > > > > >> > >>
> > > > > >> > >> Of the SDKs, it's was always designed around Beam Portability
> > > > first. It never had any "Legacy" (SDK x Runner specific ) workers.
> > > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> > execute
> > > > jobs, first with some very experimental code on Dataflow, but now
> > > > > >> > >> on all portable supported runners, like Flink, Spark, the
> > Python
> > > > Portable runner, and Dataflow.
> > > > > >> > >>
> > > > > >> > >> API Stability
> > > > > >> > >>
> > > > > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn
> > > > and pipeline construction since it was first merged in, and there are
> > no
> > > > > >> > >> changes to that on the horizon that can't be made in a
> > backwards
> > > > compatible manner. Largely these are related to New Features, or
> > > > > >> > >> usability improvements enabled by the advent of Go Generics
> > > > (think of "real" KV, emitter, and iterator types).
> > > > > >> > >>
> > > > > >> > >> It's an open secret that the Go SDK has largely been under
> > work
> > > > for use within Google. It's use is called FlumeGo, representing
> > > > > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's
> > batch
> > > > pipeline processing engine. Thus most of the focus on improving
> > > > > >> > >> batch execution. FlumeGo sees ample use today, and there
> > hasn't
> > > > been a call for fundamental changes to the API for ergonomic or
> > > > > >> > >> usability concerns.
> > > > > >> > >>
> > > > > >> > >> Scalability
> > > > > >> > >>
> > > > > >> > >> Google could get away without the Go SDK having an SDK side
> > > > scalability solution as a result of it's integration with Flume.
> > > > > >> > >> However, those days are now past.
> > > > > >> > >>
> > > > > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic
> > > > Splitting, which supports writing scalable batch transforms natively
> > > > > >> > >> in the Go SDK.
> > > > > >> > >> The SDK also supports Cross Language Transforms, with Beam
> > > > Schema encodings. With it, production hardened transforms
> > > > > >> > >> from Java and Python are a wrapper away.
> > > > > >> > >>
> > > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF side
> > work,
> > > > and completed the Xlang work,) is adding a wrapper for the
> > > > > >> > >> Java Kafka IO using Cross Language Transforms, which is often
> > > > been requested. This will also enable use of the Beam SQL
> > > > > >> > >> transforms that java enables.
> > > > > >> > >>
> > > > > >> > >> Features
> > > > > >> > >>
> > > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements
> > > > standard coders, allows for user DoFns, and CombineFns and access
> > > > > >> > >> to core transforms like Flatten, GroupByKey, and features
> > like
> > > > Side Inputs, Windowing, and User Metrics.
> > > > > >> > >> Basic windowing will be fully supported for batch even
> > through
> > > > lifted combines in the 2.32.0 release.
> > > > > >> > >>
> > > > > >> > >> All of the above enables Beam Go to be versatile for batch
> > > > execution on portable runners, and for simple streaming pipelines.
> > > > > >> > >>
> > > > > >> > >> Repo Testing
> > > > > >> > >>
> > > > > >> > >> On precommit the Go SDK runs all it's unit tests. On top of
> > > > that, it runs all it's integration tests against the Python Portable
> > runner,
> > > > > >> > >> making it quick and robust to detect breaking changes without
> > > > overspending community resources. Those same tests are also
> > > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > > >> > >>
> > > > > >> > >> The tests are executable against all runners via the
> > appropriate
> > > > Go commands (if you've stood up your own job management server),
> > > > > >> > >> or Gradle commands (which will spin up runner instances for
> > > > you). Documentation for executing tests and adding new ones
> > > > > >> > >> is on the wiki. [2] They are accessible to Go developers as
> > > > they're implemented with the standard Go testing tools.
> > > > > >> > >>
> > > > > >> > >> Shortcomings
> > > > > >> > >> That said, there's still much to do. Let me briefly tell you
> > > > what doesn't work, and it's up to you to weigh whether they block
> > > > > >> > >> being out of experimental.
> > > > > >> > >>
> > > > > >> > >> At present, only a textio has been implemented as Splittable
> > > > DoFn.
> > > > > >> > >> Once the Kafka wrapper is merged in, it will serve as a the
> > > > first example for future contributions for
> > > > > >> > >> new transform wrappers for the Go SDK.
> > > > > >> > >> Transforms and IOs are lacking, but at this point users are
> > > > empowered to write their own DoFns or wrap existing transforms for
> > Cross
> > > > Language use.
> > > > > >> > >>
> > > > > >> > >> In the core SDK, more streaming focused features have yet to
> > be
> > > > implemented, but they're largely additions to what exists already
> > > > > >> > >> rather than total rebuilds. Much of the work is definining
> > how a
> > > > user specifies their desires, and turning those into the appropriate
> > > > > >> > >> FnAPI requests at execution time. Back in October I wrote at
> > > > length on the wiki [1] what's missing for additional streaming
> > features.
> > > > > >> > >>
> > > > > >> > >> While we have bolstered our testing recently, there's likely
> > > > still more we could test to improve our confidence in the SDK,
> > > > > >> > >> in particular regarding the included transforms libraries and
> > > > examples.
> > > > > >> > >>
> > > > > >> > >> Moving Forward
> > > > > >> > >>
> > > > > >> > >> My immediate plan is to work on incorporating the Go SDK
> > fully
> > > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > > >> > >> am beginning to add missing content and filling in the Go
> > > > specific gaps. This will be tied to improving the Go Doc with more Go
> > > > > >> > >> specific user documentation that isn't appropriate for the
> > BPG.
> > > > > >> > >> And resolving the LICENSE issue around the public display of
> > > > that GoDoc.
> > > > > >> > >>
> > > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > > incorporate the SDK into the release process, and remove the
> > "experimental"
> > > > > >> > >> language around the SDK. This largely entails updating the
> > > > release scripts to also build and publish the Go SDK Docker containers.
> > > > > >> > >> As for releasing the code, we're technically already doing so
> > > > whenever we tag a release branch [4].
> > > > > >> > >>
> > > > > >> > >> The clearest signal to the Go community however will be
> > > > migrating the SDK to use Go Modules for dependency version control,
> > > > > >> > >> which Daniel is planning on working on after his Kafka task.
> > > > This will put our repo infrastructure, SDK contributors, and users
> > > > > >> > >> on the same footing when it comes to dependency management.
> > It
> > > > will remove the "+incompatible" tags one sees on the
> > > > > >> > >> pkg.go.dev list at [4].
> > > > > >> > >>
> > > > > >> > >> I'm very happy to answer any questions you might have about
> > the
> > > > SDK, and provide additional links as needed. I intentionally avoided
> > > > > >> > >> a link barrage in this email, as they can distract from the
> > > > point: The SDK is ready for folks to use it, we need to tell them that
> > they
> > > > can
> > > > > >> > >> rather than they shouldn't.
> > > > > >> > >>
> > > > > >> > >> Robert Burke
> > > > > >> > >> Defacto Beam Go TL
> > > > > >> > >>
> > > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > > >> > >> [1]
> > > >
> > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > > >> > >> [3]
> > > >
> > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > > (SDK Audit sheet)
> > > > > >> > >> [4]
> > > >
> > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > > >> >
> > > >
> > >
> >
> 

Re: [Proposal] Go SDK Exits Experimental

Posted by Ahmet Altay <al...@google.com>.
+1, congratulations & thank you!

On Tue, Jun 22, 2021 at 3:15 PM Robert Burke <lo...@apache.org> wrote:

> Regarding documentation update: Initial PR is
> https://github.com/apache/beam/pull/15057 which goes up to section ~4.3.
> JIRA link for Programing Guide changes:
> https://issues.apache.org/jira/browse/BEAM-12513
>
>
> On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote:
> > Yup!
> >
> > My immediate plan is to work on incorporating the Go SDK fully into the
> > Beam Programming Guide. I've audited the guide, and
> > am beginning to add missing content and filling in the Go specific gaps.
> > This will be tied to improving the Go Doc with more Go
> > specific user documentation that isn't appropriate for the BPG.
> >
> > My audit of the guide is here:
> >
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> >
> > The other sheets focus on features and tests. The feature page looks
> worse
> > than it is, as it was more productive to focus on what isn't available
> than
> > what is. That's a snapshot of my actual working sheet but I'll be
> updating
> > it as needed.
> >
> > On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com> wrote:
> >
> > > Oups forgot to write one question. Will this come with revamped
> > > website instructions/doc for golang too?
> > >
> > > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com>
> wrote:
> > > >
> > > > Huge +1
> > > >
> > > > This is definitely something many people have asked about, so it is
> > > > great to see it finally happening.
> > > >
> > > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org>
> wrote:
> > > > >
> > > > > +1 awesome
> > > > >
> > > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lostluck@apache.org
> >
> > > wrote:
> > > > >>
> > > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go
> modules
> > > and LICENSE issue) done before the 2.32 cut, and certainly before the
> 2.33
> > > cut if release images aren't added to the 2.32 process.
> > > > >>
> > > > >> Regarding Go Generics: at some point in the future, we may want a
> > > harder break between a newer Generic first API and and the current
> version,
> > > but there's no rush. Generics/TypeParameters in Go aren't identical to
> the
> > > feature referred to by that term in Java, C++, Rust, etc, so it'll
> take a
> > > bit of time for that expertise to develop.
> > > > >>
> > > > >> However, by the current nature of Go, we had to have pretty
> > > sophisticated reflective analysis to handle DoFns and map them to their
> > > graph inputs. So, adding new helpers like a KV, emitter, and Iterator
> > > types, shouldn't be too difficult. Changing Go SDK internals to use
> > > generics (like the implementation of Stats DoFns like Min, Max, etc)
> would
> > > also be able to be made transparently to most users, and certainly any
> of
> > > the framework for execution time handling (the "worker's SDK harness")
> > > would be able to be cleaned up if need be. Finally, adding more
> > > sophisticated DoFn registration and code generation would be able to
> > > replace the optional code generator entirely, saving some users a `go
> > > generate` step, simplifying getting improved execution performance.
> > > > >>
> > > > >> Changing things like making a Type Parameterized PCollection,
> would
> > > be far more involved, as would trying to use some kind of Apply
> format. The
> > > lack of Method Overrides prevents the apply chaining approach. Or at
> least
> > > prevents it from working simply.
> > > > >>
> > > > >> Finally, Go Generics won't be available until Go 1.18, which isn't
> > > until next year. See https://blog.golang.org/generics-proposal for
> > > details.
> > > > >>
> > > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register
> > > calling convention, leading to a modest performance improvement across
> the
> > > board.
> > > > >>
> > > > >> Cheers,
> > > > >> Robert Burke
> > > > >>
> > > > >> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com>
> wrote:
> > > > >> > +1 to declaring Golang support out of experimental once the Go
> > > Modules
> > > > >> > issues are solved. I don't think an SDK needs to support every
> > > feature
> > > > >> > to be accepted, especially now that we can do cross-language
> > > > >> > transforms, and Go definitely supports enough to be quite
> useful.
> > > (WRT
> > > > >> > streaming, my understanding is that Go supports the streaming
> model
> > > > >> > with windows and timestamps, and runs fine on a streaming
> runner,
> > > even
> > > > >> > if more advanced features like state and timers aren't yet
> > > available.)
> > > > >> >
> > > > >> > This is a great milestone.
> > > > >> >
> > > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <
> tysonjh@google.com>
> > > wrote:
> > > > >> > >
> > > > >> > > WOW! Big news.
> > > > >> > >
> > > > >> > > I'm supportive of leaving experimental status after Go Modules
> > > are completed and the LICENSE issue is resolved. I don't think that
> lacking
> > > streaming support is a blocker. The other thing I checked to see was if
> > > there were metrics available on metrics.beam.apache.org, specifically
> for
> > > measuring code health via post-commit over time, which there are and
> the
> > > passing test rate is high (Huzzah!). The one thing that surprised me
> from
> > > your summary is that when Go introduces generics it won't result in any
> > > backwards incompatible changes in Apache Beam. That's great news, but
> does
> > > it mean there will be a need to support both non-generic and generic
> APIs
> > > moving forward? It seems like generics will be introduced in the Go
> 1.17
> > > release (optimistically) in August this year.
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <
> lostluck@apache.org>
> > > wrote:
> > > > >> > >>
> > > > >> > >> Hello Beam Community!
> > > > >> > >>
> > > > >> > >> I propose we stop calling the Apache Beam Go SDK
> experimental.
> > > > >> > >>
> > > > >> > >> This thread is to discuss it as a community, and any
> conditions
> > > that remain that would prevent the exit.
> > > > >> > >>
> > > > >> > >> tl;dr;
> > > > >> > >> Ask Questions for answers and links! I have both.
> > > > >> > >> This entails including it officially in the Release process,
> > > removing the various "experimental" text throughout the repo etc,
> > > > >> > >> and otherwise treating it like Python and Java. Some Go
> specific
> > > tasks around dep versioning.
> > > > >> > >>
> > > > >> > >> The Go SDK implements the beam model efficiently for most
> batch
> > > tasks, including basic windowing.
> > > > >> > >> Apache Beam Go jobs can execute, and are tested on all
> Portable
> > > runners.
> > > > >> > >> The core APIs are not going to change in incompatible ways
> going
> > > forward.
> > > > >> > >> Scalable transforms can be written through SplittableDoFns or
> > > via Cross Language transforms.
> > > > >> > >>
> > > > >> > >> The SDK isn't 100% feature complete, but keeping it
> experimental
> > > doesn't help with that any further.
> > > > >> > >> Communities grow through contributions and use, and
> experimental
> > > markers dissuade users.
> > > > >> > >> There's plenty to do in order expand what can be done with
> the
> > > SDK. (Contributions welcome)
> > > > >> > >>
> > > > >> > >> Why Exit Experimental now?
> > > > >> > >>
> > > > >> > >> Typically when we call an SDK or API Experimental, it's
> because
> > > there's a risk that API or behaviors may change significantly.
> > > > >> > >> This in turn, leads to additional work for users of the SDK
> on
> > > every release which leads to sticking to older versions or forking
> > > > >> > >> to preserve behavior. Version updates should be looked
> forward
> > > to, and viewed as having little risk. Further while there's been
> > > > >> > >> previous dicussion about what the "low bar" is for a new
> SDK, it
> > > hasn't been summarily applied to the Go SDK. I feel this has
> > > > >> > >> hurt development and contribution of new SDK languages
> (inherent
> > > difficulty of SDK development notwithstanding).
> > > > >> > >>
> > > > >> > >> When the SDK was designed, it wasn't entirely clear what the
> > > Beam Model should look like in an opinionated language like Go.
> > > > >> > >> Their initial take (see
> > > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail
> what it
> > > means for a language without
> > > > >> > >> Generics, or overloading, or inheritance to implement the
> beam
> > > model. One could largely throw away static types (like Python),
> > > > >> > >> but this approach rings hollow for Go. It would not do if the
> > > approach couldn't grow and scale to the Beam Model. It's also hard
> > > > >> > >> to tell if an API is any good before there are users.
> > > > >> > >>
> > > > >> > >> Further, in the early days of Portability, there wasn't a
> way to
> > > write scalable DoFns, dynamically or otherwise. It's an incredible
> > > > >> > >> bottleneck to need to do all initial fanout of work on a
> single
> > > machine, write everything to a Reshuffle, just in order to scale up.
> > > > >> > >> Without being able to scale, Beam is little more than
> overhead.
> > > > >> > >>
> > > > >> > >> At this point, both of these needs are met within the Go SDK
> for
> > > open source.
> > > > >> > >>
> > > > >> > >> Background
> > > > >> > >>
> > > > >> > >> The Go SDK has been a part of the beam repo for a few years
> now,
> > > since it was accidentally merged into master.
> > > > >> > >> Since then it's been called experimental, and not officially
> > > part of the releases.
> > > > >> > >>
> > > > >> > >> Of the SDKs, it's was always designed around Beam Portability
> > > first. It never had any "Legacy" (SDK x Runner specific ) workers.
> > > > >> > >> It's always used the Beam Pipeline protos and FnAPI to
> execute
> > > jobs, first with some very experimental code on Dataflow, but now
> > > > >> > >> on all portable supported runners, like Flink, Spark, the
> Python
> > > Portable runner, and Dataflow.
> > > > >> > >>
> > > > >> > >> API Stability
> > > > >> > >>
> > > > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn
> > > and pipeline construction since it was first merged in, and there are
> no
> > > > >> > >> changes to that on the horizon that can't be made in a
> backwards
> > > compatible manner. Largely these are related to New Features, or
> > > > >> > >> usability improvements enabled by the advent of Go Generics
> > > (think of "real" KV, emitter, and iterator types).
> > > > >> > >>
> > > > >> > >> It's an open secret that the Go SDK has largely been under
> work
> > > for use within Google. It's use is called FlumeGo, representing
> > > > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's
> batch
> > > pipeline processing engine. Thus most of the focus on improving
> > > > >> > >> batch execution. FlumeGo sees ample use today, and there
> hasn't
> > > been a call for fundamental changes to the API for ergonomic or
> > > > >> > >> usability concerns.
> > > > >> > >>
> > > > >> > >> Scalability
> > > > >> > >>
> > > > >> > >> Google could get away without the Go SDK having an SDK side
> > > scalability solution as a result of it's integration with Flume.
> > > > >> > >> However, those days are now past.
> > > > >> > >>
> > > > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic
> > > Splitting, which supports writing scalable batch transforms natively
> > > > >> > >> in the Go SDK.
> > > > >> > >> The SDK also supports Cross Language Transforms, with Beam
> > > Schema encodings. With it, production hardened transforms
> > > > >> > >> from Java and Python are a wrapper away.
> > > > >> > >>
> > > > >> > >> Presently, Daniel Oliveira (who implemented the SDF side
> work,
> > > and completed the Xlang work,) is adding a wrapper for the
> > > > >> > >> Java Kafka IO using Cross Language Transforms, which is often
> > > been requested. This will also enable use of the Beam SQL
> > > > >> > >> transforms that java enables.
> > > > >> > >>
> > > > >> > >> Features
> > > > >> > >>
> > > > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements
> > > standard coders, allows for user DoFns, and CombineFns and access
> > > > >> > >> to core transforms like Flatten, GroupByKey, and features
> like
> > > Side Inputs, Windowing, and User Metrics.
> > > > >> > >> Basic windowing will be fully supported for batch even
> through
> > > lifted combines in the 2.32.0 release.
> > > > >> > >>
> > > > >> > >> All of the above enables Beam Go to be versatile for batch
> > > execution on portable runners, and for simple streaming pipelines.
> > > > >> > >>
> > > > >> > >> Repo Testing
> > > > >> > >>
> > > > >> > >> On precommit the Go SDK runs all it's unit tests. On top of
> > > that, it runs all it's integration tests against the Python Portable
> runner,
> > > > >> > >> making it quick and robust to detect breaking changes without
> > > overspending community resources. Those same tests are also
> > > > >> > >> run against Dataflow, Flink, and Spark.
> > > > >> > >>
> > > > >> > >> The tests are executable against all runners via the
> appropriate
> > > Go commands (if you've stood up your own job management server),
> > > > >> > >> or Gradle commands (which will spin up runner instances for
> > > you). Documentation for executing tests and adding new ones
> > > > >> > >> is on the wiki. [2] They are accessible to Go developers as
> > > they're implemented with the standard Go testing tools.
> > > > >> > >>
> > > > >> > >> Shortcomings
> > > > >> > >> That said, there's still much to do. Let me briefly tell you
> > > what doesn't work, and it's up to you to weigh whether they block
> > > > >> > >> being out of experimental.
> > > > >> > >>
> > > > >> > >> At present, only a textio has been implemented as Splittable
> > > DoFn.
> > > > >> > >> Once the Kafka wrapper is merged in, it will serve as a the
> > > first example for future contributions for
> > > > >> > >> new transform wrappers for the Go SDK.
> > > > >> > >> Transforms and IOs are lacking, but at this point users are
> > > empowered to write their own DoFns or wrap existing transforms for
> Cross
> > > Language use.
> > > > >> > >>
> > > > >> > >> In the core SDK, more streaming focused features have yet to
> be
> > > implemented, but they're largely additions to what exists already
> > > > >> > >> rather than total rebuilds. Much of the work is definining
> how a
> > > user specifies their desires, and turning those into the appropriate
> > > > >> > >> FnAPI requests at execution time. Back in October I wrote at
> > > length on the wiki [1] what's missing for additional streaming
> features.
> > > > >> > >>
> > > > >> > >> While we have bolstered our testing recently, there's likely
> > > still more we could test to improve our confidence in the SDK,
> > > > >> > >> in particular regarding the included transforms libraries and
> > > examples.
> > > > >> > >>
> > > > >> > >> Moving Forward
> > > > >> > >>
> > > > >> > >> My immediate plan is to work on incorporating the Go SDK
> fully
> > > into the Beam Programming Guide. I've audited the guide [3], and
> > > > >> > >> am beginning to add missing content and filling in the Go
> > > specific gaps. This will be tied to improving the Go Doc with more Go
> > > > >> > >> specific user documentation that isn't appropriate for the
> BPG.
> > > > >> > >> And resolving the LICENSE issue around the public display of
> > > that GoDoc.
> > > > >> > >>
> > > > >> > >> If this proposal is accepted by a binding vote, I will
> > > incorporate the SDK into the release process, and remove the
> "experimental"
> > > > >> > >> language around the SDK. This largely entails updating the
> > > release scripts to also build and publish the Go SDK Docker containers.
> > > > >> > >> As for releasing the code, we're technically already doing so
> > > whenever we tag a release branch [4].
> > > > >> > >>
> > > > >> > >> The clearest signal to the Go community however will be
> > > migrating the SDK to use Go Modules for dependency version control,
> > > > >> > >> which Daniel is planning on working on after his Kafka task.
> > > This will put our repo infrastructure, SDK contributors, and users
> > > > >> > >> on the same footing when it comes to dependency management.
> It
> > > will remove the "+incompatible" tags one sees on the
> > > > >> > >> pkg.go.dev list at [4].
> > > > >> > >>
> > > > >> > >> I'm very happy to answer any questions you might have about
> the
> > > SDK, and provide additional links as needed. I intentionally avoided
> > > > >> > >> a link barrage in this email, as they can distract from the
> > > point: The SDK is ready for folks to use it, we need to tell them that
> they
> > > can
> > > > >> > >> rather than they shouldn't.
> > > > >> > >>
> > > > >> > >> Robert Burke
> > > > >> > >> Defacto Beam Go TL
> > > > >> > >>
> > > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > > >> > >> [1]
> > >
> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > > >> > >> [3]
> > >
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > > (SDK Audit sheet)
> > > > >> > >> [4]
> > >
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > > >> >
> > >
> >
>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <lo...@apache.org>.
Regarding documentation update: Initial PR is https://github.com/apache/beam/pull/15057 which goes up to section ~4.3. JIRA link for Programing Guide changes: https://issues.apache.org/jira/browse/BEAM-12513


On 2021/06/17 14:58:54, Robert Burke <ro...@frantil.com> wrote: 
> Yup!
> 
> My immediate plan is to work on incorporating the Go SDK fully into the
> Beam Programming Guide. I've audited the guide, and
> am beginning to add missing content and filling in the Go specific gaps.
> This will be tied to improving the Go Doc with more Go
> specific user documentation that isn't appropriate for the BPG.
> 
> My audit of the guide is here:
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> 
> The other sheets focus on features and tests. The feature page looks worse
> than it is, as it was more productive to focus on what isn't available than
> what is. That's a snapshot of my actual working sheet but I'll be updating
> it as needed.
> 
> On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com> wrote:
> 
> > Oups forgot to write one question. Will this come with revamped
> > website instructions/doc for golang too?
> >
> > On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com> wrote:
> > >
> > > Huge +1
> > >
> > > This is definitely something many people have asked about, so it is
> > > great to see it finally happening.
> > >
> > > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org> wrote:
> > > >
> > > > +1 awesome
> > > >
> > > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lo...@apache.org>
> > wrote:
> > > >>
> > > >> Sounds reasonable to me. I agree. We'll aim to get those (Go modules
> > and LICENSE issue) done before the 2.32 cut, and certainly before the 2.33
> > cut if release images aren't added to the 2.32 process.
> > > >>
> > > >> Regarding Go Generics: at some point in the future, we may want a
> > harder break between a newer Generic first API and and the current version,
> > but there's no rush. Generics/TypeParameters in Go aren't identical to the
> > feature referred to by that term in Java, C++, Rust, etc, so it'll take a
> > bit of time for that expertise to develop.
> > > >>
> > > >> However, by the current nature of Go, we had to have pretty
> > sophisticated reflective analysis to handle DoFns and map them to their
> > graph inputs. So, adding new helpers like a KV, emitter, and Iterator
> > types, shouldn't be too difficult. Changing Go SDK internals to use
> > generics (like the implementation of Stats DoFns like Min, Max, etc) would
> > also be able to be made transparently to most users, and certainly any of
> > the framework for execution time handling (the "worker's SDK harness")
> > would be able to be cleaned up if need be. Finally, adding more
> > sophisticated DoFn registration and code generation would be able to
> > replace the optional code generator entirely, saving some users a `go
> > generate` step, simplifying getting improved execution performance.
> > > >>
> > > >> Changing things like making a Type Parameterized PCollection, would
> > be far more involved, as would trying to use some kind of Apply format. The
> > lack of Method Overrides prevents the apply chaining approach. Or at least
> > prevents it from working simply.
> > > >>
> > > >> Finally, Go Generics won't be available until Go 1.18, which isn't
> > until next year. See https://blog.golang.org/generics-proposal for
> > details.
> > > >>
> > > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register
> > calling convention, leading to a modest performance improvement across the
> > board.
> > > >>
> > > >> Cheers,
> > > >> Robert Burke
> > > >>
> > > >> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com> wrote:
> > > >> > +1 to declaring Golang support out of experimental once the Go
> > Modules
> > > >> > issues are solved. I don't think an SDK needs to support every
> > feature
> > > >> > to be accepted, especially now that we can do cross-language
> > > >> > transforms, and Go definitely supports enough to be quite useful.
> > (WRT
> > > >> > streaming, my understanding is that Go supports the streaming model
> > > >> > with windows and timestamps, and runs fine on a streaming runner,
> > even
> > > >> > if more advanced features like state and timers aren't yet
> > available.)
> > > >> >
> > > >> > This is a great milestone.
> > > >> >
> > > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com>
> > wrote:
> > > >> > >
> > > >> > > WOW! Big news.
> > > >> > >
> > > >> > > I'm supportive of leaving experimental status after Go Modules
> > are completed and the LICENSE issue is resolved. I don't think that lacking
> > streaming support is a blocker. The other thing I checked to see was if
> > there were metrics available on metrics.beam.apache.org, specifically for
> > measuring code health via post-commit over time, which there are and the
> > passing test rate is high (Huzzah!). The one thing that surprised me from
> > your summary is that when Go introduces generics it won't result in any
> > backwards incompatible changes in Apache Beam. That's great news, but does
> > it mean there will be a need to support both non-generic and generic APIs
> > moving forward? It seems like generics will be introduced in the Go 1.17
> > release (optimistically) in August this year.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org>
> > wrote:
> > > >> > >>
> > > >> > >> Hello Beam Community!
> > > >> > >>
> > > >> > >> I propose we stop calling the Apache Beam Go SDK experimental.
> > > >> > >>
> > > >> > >> This thread is to discuss it as a community, and any conditions
> > that remain that would prevent the exit.
> > > >> > >>
> > > >> > >> tl;dr;
> > > >> > >> Ask Questions for answers and links! I have both.
> > > >> > >> This entails including it officially in the Release process,
> > removing the various "experimental" text throughout the repo etc,
> > > >> > >> and otherwise treating it like Python and Java. Some Go specific
> > tasks around dep versioning.
> > > >> > >>
> > > >> > >> The Go SDK implements the beam model efficiently for most batch
> > tasks, including basic windowing.
> > > >> > >> Apache Beam Go jobs can execute, and are tested on all Portable
> > runners.
> > > >> > >> The core APIs are not going to change in incompatible ways going
> > forward.
> > > >> > >> Scalable transforms can be written through SplittableDoFns or
> > via Cross Language transforms.
> > > >> > >>
> > > >> > >> The SDK isn't 100% feature complete, but keeping it experimental
> > doesn't help with that any further.
> > > >> > >> Communities grow through contributions and use, and experimental
> > markers dissuade users.
> > > >> > >> There's plenty to do in order expand what can be done with the
> > SDK. (Contributions welcome)
> > > >> > >>
> > > >> > >> Why Exit Experimental now?
> > > >> > >>
> > > >> > >> Typically when we call an SDK or API Experimental, it's because
> > there's a risk that API or behaviors may change significantly.
> > > >> > >> This in turn, leads to additional work for users of the SDK on
> > every release which leads to sticking to older versions or forking
> > > >> > >> to preserve behavior. Version updates should be looked forward
> > to, and viewed as having little risk. Further while there's been
> > > >> > >> previous dicussion about what the "low bar" is for a new SDK, it
> > hasn't been summarily applied to the Go SDK. I feel this has
> > > >> > >> hurt development and contribution of new SDK languages (inherent
> > difficulty of SDK development notwithstanding).
> > > >> > >>
> > > >> > >> When the SDK was designed, it wasn't entirely clear what the
> > Beam Model should look like in an opinionated language like Go.
> > > >> > >> Their initial take (see
> > https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it
> > means for a language without
> > > >> > >> Generics, or overloading, or inheritance to implement the beam
> > model. One could largely throw away static types (like Python),
> > > >> > >> but this approach rings hollow for Go. It would not do if the
> > approach couldn't grow and scale to the Beam Model. It's also hard
> > > >> > >> to tell if an API is any good before there are users.
> > > >> > >>
> > > >> > >> Further, in the early days of Portability, there wasn't a way to
> > write scalable DoFns, dynamically or otherwise. It's an incredible
> > > >> > >> bottleneck to need to do all initial fanout of work on a single
> > machine, write everything to a Reshuffle, just in order to scale up.
> > > >> > >> Without being able to scale, Beam is little more than overhead.
> > > >> > >>
> > > >> > >> At this point, both of these needs are met within the Go SDK for
> > open source.
> > > >> > >>
> > > >> > >> Background
> > > >> > >>
> > > >> > >> The Go SDK has been a part of the beam repo for a few years now,
> > since it was accidentally merged into master.
> > > >> > >> Since then it's been called experimental, and not officially
> > part of the releases.
> > > >> > >>
> > > >> > >> Of the SDKs, it's was always designed around Beam Portability
> > first. It never had any "Legacy" (SDK x Runner specific ) workers.
> > > >> > >> It's always used the Beam Pipeline protos and FnAPI to execute
> > jobs, first with some very experimental code on Dataflow, but now
> > > >> > >> on all portable supported runners, like Flink, Spark, the Python
> > Portable runner, and Dataflow.
> > > >> > >>
> > > >> > >> API Stability
> > > >> > >>
> > > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn
> > and pipeline construction since it was first merged in, and there are no
> > > >> > >> changes to that on the horizon that can't be made in a backwards
> > compatible manner. Largely these are related to New Features, or
> > > >> > >> usability improvements enabled by the advent of Go Generics
> > (think of "real" KV, emitter, and iterator types).
> > > >> > >>
> > > >> > >> It's an open secret that the Go SDK has largely been under work
> > for use within Google. It's use is called FlumeGo, representing
> > > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch
> > pipeline processing engine. Thus most of the focus on improving
> > > >> > >> batch execution. FlumeGo sees ample use today, and there hasn't
> > been a call for fundamental changes to the API for ergonomic or
> > > >> > >> usability concerns.
> > > >> > >>
> > > >> > >> Scalability
> > > >> > >>
> > > >> > >> Google could get away without the Go SDK having an SDK side
> > scalability solution as a result of it's integration with Flume.
> > > >> > >> However, those days are now past.
> > > >> > >>
> > > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic
> > Splitting, which supports writing scalable batch transforms natively
> > > >> > >> in the Go SDK.
> > > >> > >> The SDK also supports Cross Language Transforms, with Beam
> > Schema encodings. With it, production hardened transforms
> > > >> > >> from Java and Python are a wrapper away.
> > > >> > >>
> > > >> > >> Presently, Daniel Oliveira (who implemented the SDF side work,
> > and completed the Xlang work,) is adding a wrapper for the
> > > >> > >> Java Kafka IO using Cross Language Transforms, which is often
> > been requested. This will also enable use of the Beam SQL
> > > >> > >> transforms that java enables.
> > > >> > >>
> > > >> > >> Features
> > > >> > >>
> > > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements
> > standard coders, allows for user DoFns, and CombineFns and access
> > > >> > >> to core transforms like Flatten, GroupByKey, and features like
> > Side Inputs, Windowing, and User Metrics.
> > > >> > >> Basic windowing will be fully supported for batch even through
> > lifted combines in the 2.32.0 release.
> > > >> > >>
> > > >> > >> All of the above enables Beam Go to be versatile for batch
> > execution on portable runners, and for simple streaming pipelines.
> > > >> > >>
> > > >> > >> Repo Testing
> > > >> > >>
> > > >> > >> On precommit the Go SDK runs all it's unit tests. On top of
> > that, it runs all it's integration tests against the Python Portable runner,
> > > >> > >> making it quick and robust to detect breaking changes without
> > overspending community resources. Those same tests are also
> > > >> > >> run against Dataflow, Flink, and Spark.
> > > >> > >>
> > > >> > >> The tests are executable against all runners via the appropriate
> > Go commands (if you've stood up your own job management server),
> > > >> > >> or Gradle commands (which will spin up runner instances for
> > you). Documentation for executing tests and adding new ones
> > > >> > >> is on the wiki. [2] They are accessible to Go developers as
> > they're implemented with the standard Go testing tools.
> > > >> > >>
> > > >> > >> Shortcomings
> > > >> > >> That said, there's still much to do. Let me briefly tell you
> > what doesn't work, and it's up to you to weigh whether they block
> > > >> > >> being out of experimental.
> > > >> > >>
> > > >> > >> At present, only a textio has been implemented as Splittable
> > DoFn.
> > > >> > >> Once the Kafka wrapper is merged in, it will serve as a the
> > first example for future contributions for
> > > >> > >> new transform wrappers for the Go SDK.
> > > >> > >> Transforms and IOs are lacking, but at this point users are
> > empowered to write their own DoFns or wrap existing transforms for Cross
> > Language use.
> > > >> > >>
> > > >> > >> In the core SDK, more streaming focused features have yet to be
> > implemented, but they're largely additions to what exists already
> > > >> > >> rather than total rebuilds. Much of the work is definining how a
> > user specifies their desires, and turning those into the appropriate
> > > >> > >> FnAPI requests at execution time. Back in October I wrote at
> > length on the wiki [1] what's missing for additional streaming features.
> > > >> > >>
> > > >> > >> While we have bolstered our testing recently, there's likely
> > still more we could test to improve our confidence in the SDK,
> > > >> > >> in particular regarding the included transforms libraries and
> > examples.
> > > >> > >>
> > > >> > >> Moving Forward
> > > >> > >>
> > > >> > >> My immediate plan is to work on incorporating the Go SDK fully
> > into the Beam Programming Guide. I've audited the guide [3], and
> > > >> > >> am beginning to add missing content and filling in the Go
> > specific gaps. This will be tied to improving the Go Doc with more Go
> > > >> > >> specific user documentation that isn't appropriate for the BPG.
> > > >> > >> And resolving the LICENSE issue around the public display of
> > that GoDoc.
> > > >> > >>
> > > >> > >> If this proposal is accepted by a binding vote, I will
> > incorporate the SDK into the release process, and remove the "experimental"
> > > >> > >> language around the SDK. This largely entails updating the
> > release scripts to also build and publish the Go SDK Docker containers.
> > > >> > >> As for releasing the code, we're technically already doing so
> > whenever we tag a release branch [4].
> > > >> > >>
> > > >> > >> The clearest signal to the Go community however will be
> > migrating the SDK to use Go Modules for dependency version control,
> > > >> > >> which Daniel is planning on working on after his Kafka task.
> > This will put our repo infrastructure, SDK contributors, and users
> > > >> > >> on the same footing when it comes to dependency management. It
> > will remove the "+incompatible" tags one sees on the
> > > >> > >> pkg.go.dev list at [4].
> > > >> > >>
> > > >> > >> I'm very happy to answer any questions you might have about the
> > SDK, and provide additional links as needed. I intentionally avoided
> > > >> > >> a link barrage in this email, as they can distract from the
> > point: The SDK is ready for folks to use it, we need to tell them that they
> > can
> > > >> > >> rather than they shouldn't.
> > > >> > >>
> > > >> > >> Robert Burke
> > > >> > >> Defacto Beam Go TL
> > > >> > >>
> > > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > > >> > >> [1]
> > https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > > >> > >> [3]
> > https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> > (SDK Audit sheet)
> > > >> > >> [4]
> > https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > > >> >
> >
> 

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <ro...@frantil.com>.
Yup!

My immediate plan is to work on incorporating the Go SDK fully into the
Beam Programming Guide. I've audited the guide, and
am beginning to add missing content and filling in the Go specific gaps.
This will be tied to improving the Go Doc with more Go
specific user documentation that isn't appropriate for the BPG.

My audit of the guide is here:
https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090

The other sheets focus on features and tests. The feature page looks worse
than it is, as it was more productive to focus on what isn't available than
what is. That's a snapshot of my actual working sheet but I'll be updating
it as needed.

On Thu, Jun 17, 2021, 6:23 AM Ismaël Mejía <ie...@gmail.com> wrote:

> Oups forgot to write one question. Will this come with revamped
> website instructions/doc for golang too?
>
> On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com> wrote:
> >
> > Huge +1
> >
> > This is definitely something many people have asked about, so it is
> > great to see it finally happening.
> >
> > On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org> wrote:
> > >
> > > +1 awesome
> > >
> > > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lo...@apache.org>
> wrote:
> > >>
> > >> Sounds reasonable to me. I agree. We'll aim to get those (Go modules
> and LICENSE issue) done before the 2.32 cut, and certainly before the 2.33
> cut if release images aren't added to the 2.32 process.
> > >>
> > >> Regarding Go Generics: at some point in the future, we may want a
> harder break between a newer Generic first API and and the current version,
> but there's no rush. Generics/TypeParameters in Go aren't identical to the
> feature referred to by that term in Java, C++, Rust, etc, so it'll take a
> bit of time for that expertise to develop.
> > >>
> > >> However, by the current nature of Go, we had to have pretty
> sophisticated reflective analysis to handle DoFns and map them to their
> graph inputs. So, adding new helpers like a KV, emitter, and Iterator
> types, shouldn't be too difficult. Changing Go SDK internals to use
> generics (like the implementation of Stats DoFns like Min, Max, etc) would
> also be able to be made transparently to most users, and certainly any of
> the framework for execution time handling (the "worker's SDK harness")
> would be able to be cleaned up if need be. Finally, adding more
> sophisticated DoFn registration and code generation would be able to
> replace the optional code generator entirely, saving some users a `go
> generate` step, simplifying getting improved execution performance.
> > >>
> > >> Changing things like making a Type Parameterized PCollection, would
> be far more involved, as would trying to use some kind of Apply format. The
> lack of Method Overrides prevents the apply chaining approach. Or at least
> prevents it from working simply.
> > >>
> > >> Finally, Go Generics won't be available until Go 1.18, which isn't
> until next year. See https://blog.golang.org/generics-proposal for
> details.
> > >>
> > >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register
> calling convention, leading to a modest performance improvement across the
> board.
> > >>
> > >> Cheers,
> > >> Robert Burke
> > >>
> > >> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com> wrote:
> > >> > +1 to declaring Golang support out of experimental once the Go
> Modules
> > >> > issues are solved. I don't think an SDK needs to support every
> feature
> > >> > to be accepted, especially now that we can do cross-language
> > >> > transforms, and Go definitely supports enough to be quite useful.
> (WRT
> > >> > streaming, my understanding is that Go supports the streaming model
> > >> > with windows and timestamps, and runs fine on a streaming runner,
> even
> > >> > if more advanced features like state and timers aren't yet
> available.)
> > >> >
> > >> > This is a great milestone.
> > >> >
> > >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com>
> wrote:
> > >> > >
> > >> > > WOW! Big news.
> > >> > >
> > >> > > I'm supportive of leaving experimental status after Go Modules
> are completed and the LICENSE issue is resolved. I don't think that lacking
> streaming support is a blocker. The other thing I checked to see was if
> there were metrics available on metrics.beam.apache.org, specifically for
> measuring code health via post-commit over time, which there are and the
> passing test rate is high (Huzzah!). The one thing that surprised me from
> your summary is that when Go introduces generics it won't result in any
> backwards incompatible changes in Apache Beam. That's great news, but does
> it mean there will be a need to support both non-generic and generic APIs
> moving forward? It seems like generics will be introduced in the Go 1.17
> release (optimistically) in August this year.
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org>
> wrote:
> > >> > >>
> > >> > >> Hello Beam Community!
> > >> > >>
> > >> > >> I propose we stop calling the Apache Beam Go SDK experimental.
> > >> > >>
> > >> > >> This thread is to discuss it as a community, and any conditions
> that remain that would prevent the exit.
> > >> > >>
> > >> > >> tl;dr;
> > >> > >> Ask Questions for answers and links! I have both.
> > >> > >> This entails including it officially in the Release process,
> removing the various "experimental" text throughout the repo etc,
> > >> > >> and otherwise treating it like Python and Java. Some Go specific
> tasks around dep versioning.
> > >> > >>
> > >> > >> The Go SDK implements the beam model efficiently for most batch
> tasks, including basic windowing.
> > >> > >> Apache Beam Go jobs can execute, and are tested on all Portable
> runners.
> > >> > >> The core APIs are not going to change in incompatible ways going
> forward.
> > >> > >> Scalable transforms can be written through SplittableDoFns or
> via Cross Language transforms.
> > >> > >>
> > >> > >> The SDK isn't 100% feature complete, but keeping it experimental
> doesn't help with that any further.
> > >> > >> Communities grow through contributions and use, and experimental
> markers dissuade users.
> > >> > >> There's plenty to do in order expand what can be done with the
> SDK. (Contributions welcome)
> > >> > >>
> > >> > >> Why Exit Experimental now?
> > >> > >>
> > >> > >> Typically when we call an SDK or API Experimental, it's because
> there's a risk that API or behaviors may change significantly.
> > >> > >> This in turn, leads to additional work for users of the SDK on
> every release which leads to sticking to older versions or forking
> > >> > >> to preserve behavior. Version updates should be looked forward
> to, and viewed as having little risk. Further while there's been
> > >> > >> previous dicussion about what the "low bar" is for a new SDK, it
> hasn't been summarily applied to the Go SDK. I feel this has
> > >> > >> hurt development and contribution of new SDK languages (inherent
> difficulty of SDK development notwithstanding).
> > >> > >>
> > >> > >> When the SDK was designed, it wasn't entirely clear what the
> Beam Model should look like in an opinionated language like Go.
> > >> > >> Their initial take (see
> https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it
> means for a language without
> > >> > >> Generics, or overloading, or inheritance to implement the beam
> model. One could largely throw away static types (like Python),
> > >> > >> but this approach rings hollow for Go. It would not do if the
> approach couldn't grow and scale to the Beam Model. It's also hard
> > >> > >> to tell if an API is any good before there are users.
> > >> > >>
> > >> > >> Further, in the early days of Portability, there wasn't a way to
> write scalable DoFns, dynamically or otherwise. It's an incredible
> > >> > >> bottleneck to need to do all initial fanout of work on a single
> machine, write everything to a Reshuffle, just in order to scale up.
> > >> > >> Without being able to scale, Beam is little more than overhead.
> > >> > >>
> > >> > >> At this point, both of these needs are met within the Go SDK for
> open source.
> > >> > >>
> > >> > >> Background
> > >> > >>
> > >> > >> The Go SDK has been a part of the beam repo for a few years now,
> since it was accidentally merged into master.
> > >> > >> Since then it's been called experimental, and not officially
> part of the releases.
> > >> > >>
> > >> > >> Of the SDKs, it's was always designed around Beam Portability
> first. It never had any "Legacy" (SDK x Runner specific ) workers.
> > >> > >> It's always used the Beam Pipeline protos and FnAPI to execute
> jobs, first with some very experimental code on Dataflow, but now
> > >> > >> on all portable supported runners, like Flink, Spark, the Python
> Portable runner, and Dataflow.
> > >> > >>
> > >> > >> API Stability
> > >> > >>
> > >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn
> and pipeline construction since it was first merged in, and there are no
> > >> > >> changes to that on the horizon that can't be made in a backwards
> compatible manner. Largely these are related to New Features, or
> > >> > >> usability improvements enabled by the advent of Go Generics
> (think of "real" KV, emitter, and iterator types).
> > >> > >>
> > >> > >> It's an open secret that the Go SDK has largely been under work
> for use within Google. It's use is called FlumeGo, representing
> > >> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch
> pipeline processing engine. Thus most of the focus on improving
> > >> > >> batch execution. FlumeGo sees ample use today, and there hasn't
> been a call for fundamental changes to the API for ergonomic or
> > >> > >> usability concerns.
> > >> > >>
> > >> > >> Scalability
> > >> > >>
> > >> > >> Google could get away without the Go SDK having an SDK side
> scalability solution as a result of it's integration with Flume.
> > >> > >> However, those days are now past.
> > >> > >>
> > >> > >> The Go SDK now supports SplittableDoFns along with Dynamic
> Splitting, which supports writing scalable batch transforms natively
> > >> > >> in the Go SDK.
> > >> > >> The SDK also supports Cross Language Transforms, with Beam
> Schema encodings. With it, production hardened transforms
> > >> > >> from Java and Python are a wrapper away.
> > >> > >>
> > >> > >> Presently, Daniel Oliveira (who implemented the SDF side work,
> and completed the Xlang work,) is adding a wrapper for the
> > >> > >> Java Kafka IO using Cross Language Transforms, which is often
> been requested. This will also enable use of the Beam SQL
> > >> > >> transforms that java enables.
> > >> > >>
> > >> > >> Features
> > >> > >>
> > >> > >> The Go SDK implements the Beam C=core. The Go SDK implements
> standard coders, allows for user DoFns, and CombineFns and access
> > >> > >> to core transforms like Flatten, GroupByKey, and features like
> Side Inputs, Windowing, and User Metrics.
> > >> > >> Basic windowing will be fully supported for batch even through
> lifted combines in the 2.32.0 release.
> > >> > >>
> > >> > >> All of the above enables Beam Go to be versatile for batch
> execution on portable runners, and for simple streaming pipelines.
> > >> > >>
> > >> > >> Repo Testing
> > >> > >>
> > >> > >> On precommit the Go SDK runs all it's unit tests. On top of
> that, it runs all it's integration tests against the Python Portable runner,
> > >> > >> making it quick and robust to detect breaking changes without
> overspending community resources. Those same tests are also
> > >> > >> run against Dataflow, Flink, and Spark.
> > >> > >>
> > >> > >> The tests are executable against all runners via the appropriate
> Go commands (if you've stood up your own job management server),
> > >> > >> or Gradle commands (which will spin up runner instances for
> you). Documentation for executing tests and adding new ones
> > >> > >> is on the wiki. [2] They are accessible to Go developers as
> they're implemented with the standard Go testing tools.
> > >> > >>
> > >> > >> Shortcomings
> > >> > >> That said, there's still much to do. Let me briefly tell you
> what doesn't work, and it's up to you to weigh whether they block
> > >> > >> being out of experimental.
> > >> > >>
> > >> > >> At present, only a textio has been implemented as Splittable
> DoFn.
> > >> > >> Once the Kafka wrapper is merged in, it will serve as a the
> first example for future contributions for
> > >> > >> new transform wrappers for the Go SDK.
> > >> > >> Transforms and IOs are lacking, but at this point users are
> empowered to write their own DoFns or wrap existing transforms for Cross
> Language use.
> > >> > >>
> > >> > >> In the core SDK, more streaming focused features have yet to be
> implemented, but they're largely additions to what exists already
> > >> > >> rather than total rebuilds. Much of the work is definining how a
> user specifies their desires, and turning those into the appropriate
> > >> > >> FnAPI requests at execution time. Back in October I wrote at
> length on the wiki [1] what's missing for additional streaming features.
> > >> > >>
> > >> > >> While we have bolstered our testing recently, there's likely
> still more we could test to improve our confidence in the SDK,
> > >> > >> in particular regarding the included transforms libraries and
> examples.
> > >> > >>
> > >> > >> Moving Forward
> > >> > >>
> > >> > >> My immediate plan is to work on incorporating the Go SDK fully
> into the Beam Programming Guide. I've audited the guide [3], and
> > >> > >> am beginning to add missing content and filling in the Go
> specific gaps. This will be tied to improving the Go Doc with more Go
> > >> > >> specific user documentation that isn't appropriate for the BPG.
> > >> > >> And resolving the LICENSE issue around the public display of
> that GoDoc.
> > >> > >>
> > >> > >> If this proposal is accepted by a binding vote, I will
> incorporate the SDK into the release process, and remove the "experimental"
> > >> > >> language around the SDK. This largely entails updating the
> release scripts to also build and publish the Go SDK Docker containers.
> > >> > >> As for releasing the code, we're technically already doing so
> whenever we tag a release branch [4].
> > >> > >>
> > >> > >> The clearest signal to the Go community however will be
> migrating the SDK to use Go Modules for dependency version control,
> > >> > >> which Daniel is planning on working on after his Kafka task.
> This will put our repo infrastructure, SDK contributors, and users
> > >> > >> on the same footing when it comes to dependency management. It
> will remove the "+incompatible" tags one sees on the
> > >> > >> pkg.go.dev list at [4].
> > >> > >>
> > >> > >> I'm very happy to answer any questions you might have about the
> SDK, and provide additional links as needed. I intentionally avoided
> > >> > >> a link barrage in this email, as they can distract from the
> point: The SDK is ready for folks to use it, we need to tell them that they
> can
> > >> > >> rather than they shouldn't.
> > >> > >>
> > >> > >> Robert Burke
> > >> > >> Defacto Beam Go TL
> > >> > >>
> > >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > >> > >> [1]
> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > >> > >> [3]
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> (SDK Audit sheet)
> > >> > >> [4]
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> > >> >
>

Re: [Proposal] Go SDK Exits Experimental

Posted by Ismaël Mejía <ie...@gmail.com>.
Oups forgot to write one question. Will this come with revamped
website instructions/doc for golang too?

On Thu, Jun 17, 2021 at 3:21 PM Ismaël Mejía <ie...@gmail.com> wrote:
>
> Huge +1
>
> This is definitely something many people have asked about, so it is
> great to see it finally happening.
>
> On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org> wrote:
> >
> > +1 awesome
> >
> > On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lo...@apache.org> wrote:
> >>
> >> Sounds reasonable to me. I agree. We'll aim to get those (Go modules and LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut if release images aren't added to the 2.32 process.
> >>
> >> Regarding Go Generics: at some point in the future, we may want a harder break between a newer Generic first API and and the current version, but there's no rush. Generics/TypeParameters in Go aren't identical to the feature referred to by that term in Java, C++, Rust, etc, so it'll take a bit of time for that expertise to develop.
> >>
> >> However, by the current nature of Go, we had to have pretty sophisticated reflective analysis to handle DoFns and map them to their graph inputs. So, adding new helpers like a KV, emitter, and Iterator types, shouldn't be too difficult. Changing Go SDK internals to use generics (like the implementation of Stats DoFns like Min, Max, etc) would also be able to be made transparently to most users, and certainly any of the framework for execution time handling (the "worker's SDK harness") would be able to be cleaned up if need be. Finally, adding more sophisticated DoFn registration and code generation would be able to replace the optional code generator entirely, saving some users a `go generate` step, simplifying getting improved execution performance.
> >>
> >> Changing things like making a Type Parameterized PCollection, would be far more involved, as would trying to use some kind of Apply format. The lack of Method Overrides prevents the apply chaining approach. Or at least prevents it from working simply.
> >>
> >> Finally, Go Generics won't be available until Go 1.18, which isn't until next year. See https://blog.golang.org/generics-proposal for details.
> >>
> >> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling convention, leading to a modest performance improvement across the board.
> >>
> >> Cheers,
> >> Robert Burke
> >>
> >> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com> wrote:
> >> > +1 to declaring Golang support out of experimental once the Go Modules
> >> > issues are solved. I don't think an SDK needs to support every feature
> >> > to be accepted, especially now that we can do cross-language
> >> > transforms, and Go definitely supports enough to be quite useful. (WRT
> >> > streaming, my understanding is that Go supports the streaming model
> >> > with windows and timestamps, and runs fine on a streaming runner, even
> >> > if more advanced features like state and timers aren't yet available.)
> >> >
> >> > This is a great milestone.
> >> >
> >> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com> wrote:
> >> > >
> >> > > WOW! Big news.
> >> > >
> >> > > I'm supportive of leaving experimental status after Go Modules are completed and the LICENSE issue is resolved. I don't think that lacking streaming support is a blocker. The other thing I checked to see was if there were metrics available on metrics.beam.apache.org, specifically for measuring code health via post-commit over time, which there are and the passing test rate is high (Huzzah!). The one thing that surprised me from your summary is that when Go introduces generics it won't result in any backwards incompatible changes in Apache Beam. That's great news, but does it mean there will be a need to support both non-generic and generic APIs moving forward? It seems like generics will be introduced in the Go 1.17 release (optimistically) in August this year.
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org> wrote:
> >> > >>
> >> > >> Hello Beam Community!
> >> > >>
> >> > >> I propose we stop calling the Apache Beam Go SDK experimental.
> >> > >>
> >> > >> This thread is to discuss it as a community, and any conditions that remain that would prevent the exit.
> >> > >>
> >> > >> tl;dr;
> >> > >> Ask Questions for answers and links! I have both.
> >> > >> This entails including it officially in the Release process, removing the various "experimental" text throughout the repo etc,
> >> > >> and otherwise treating it like Python and Java. Some Go specific tasks around dep versioning.
> >> > >>
> >> > >> The Go SDK implements the beam model efficiently for most batch tasks, including basic windowing.
> >> > >> Apache Beam Go jobs can execute, and are tested on all Portable runners.
> >> > >> The core APIs are not going to change in incompatible ways going forward.
> >> > >> Scalable transforms can be written through SplittableDoFns or via Cross Language transforms.
> >> > >>
> >> > >> The SDK isn't 100% feature complete, but keeping it experimental doesn't help with that any further.
> >> > >> Communities grow through contributions and use, and experimental markers dissuade users.
> >> > >> There's plenty to do in order expand what can be done with the SDK. (Contributions welcome)
> >> > >>
> >> > >> Why Exit Experimental now?
> >> > >>
> >> > >> Typically when we call an SDK or API Experimental, it's because there's a risk that API or behaviors may change significantly.
> >> > >> This in turn, leads to additional work for users of the SDK on every release which leads to sticking to older versions or forking
> >> > >> to preserve behavior. Version updates should be looked forward to, and viewed as having little risk. Further while there's been
> >> > >> previous dicussion about what the "low bar" is for a new SDK, it hasn't been summarily applied to the Go SDK. I feel this has
> >> > >> hurt development and contribution of new SDK languages (inherent difficulty of SDK development notwithstanding).
> >> > >>
> >> > >> When the SDK was designed, it wasn't entirely clear what the Beam Model should look like in an opinionated language like Go.
> >> > >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it means for a language without
> >> > >> Generics, or overloading, or inheritance to implement the beam model. One could largely throw away static types (like Python),
> >> > >> but this approach rings hollow for Go. It would not do if the approach couldn't grow and scale to the Beam Model. It's also hard
> >> > >> to tell if an API is any good before there are users.
> >> > >>
> >> > >> Further, in the early days of Portability, there wasn't a way to write scalable DoFns, dynamically or otherwise. It's an incredible
> >> > >> bottleneck to need to do all initial fanout of work on a single machine, write everything to a Reshuffle, just in order to scale up.
> >> > >> Without being able to scale, Beam is little more than overhead.
> >> > >>
> >> > >> At this point, both of these needs are met within the Go SDK for open source.
> >> > >>
> >> > >> Background
> >> > >>
> >> > >> The Go SDK has been a part of the beam repo for a few years now, since it was accidentally merged into master.
> >> > >> Since then it's been called experimental, and not officially part of the releases.
> >> > >>
> >> > >> Of the SDKs, it's was always designed around Beam Portability first. It never had any "Legacy" (SDK x Runner specific ) workers.
> >> > >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first with some very experimental code on Dataflow, but now
> >> > >> on all portable supported runners, like Flink, Spark, the Python Portable runner, and Dataflow.
> >> > >>
> >> > >> API Stability
> >> > >>
> >> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline construction since it was first merged in, and there are no
> >> > >> changes to that on the horizon that can't be made in a backwards compatible manner. Largely these are related to New Features, or
> >> > >> usability improvements enabled by the advent of Go Generics (think of "real" KV, emitter, and iterator types).
> >> > >>
> >> > >> It's an open secret that the Go SDK has largely been under work for use within Google. It's use is called FlumeGo, representing
> >> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline processing engine. Thus most of the focus on improving
> >> > >> batch execution. FlumeGo sees ample use today, and there hasn't been a call for fundamental changes to the API for ergonomic or
> >> > >> usability concerns.
> >> > >>
> >> > >> Scalability
> >> > >>
> >> > >> Google could get away without the Go SDK having an SDK side scalability solution as a result of it's integration with Flume.
> >> > >> However, those days are now past.
> >> > >>
> >> > >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which supports writing scalable batch transforms natively
> >> > >> in the Go SDK.
> >> > >> The SDK also supports Cross Language Transforms, with Beam Schema encodings. With it, production hardened transforms
> >> > >> from Java and Python are a wrapper away.
> >> > >>
> >> > >> Presently, Daniel Oliveira (who implemented the SDF side work, and completed the Xlang work,) is adding a wrapper for the
> >> > >> Java Kafka IO using Cross Language Transforms, which is often been requested. This will also enable use of the Beam SQL
> >> > >> transforms that java enables.
> >> > >>
> >> > >> Features
> >> > >>
> >> > >> The Go SDK implements the Beam C=core. The Go SDK implements standard coders, allows for user DoFns, and CombineFns and access
> >> > >> to core transforms like Flatten, GroupByKey, and features like Side Inputs, Windowing, and User Metrics.
> >> > >> Basic windowing will be fully supported for batch even through lifted combines in the 2.32.0 release.
> >> > >>
> >> > >> All of the above enables Beam Go to be versatile for batch execution on portable runners, and for simple streaming pipelines.
> >> > >>
> >> > >> Repo Testing
> >> > >>
> >> > >> On precommit the Go SDK runs all it's unit tests. On top of that, it runs all it's integration tests against the Python Portable runner,
> >> > >> making it quick and robust to detect breaking changes without overspending community resources. Those same tests are also
> >> > >> run against Dataflow, Flink, and Spark.
> >> > >>
> >> > >> The tests are executable against all runners via the appropriate Go commands (if you've stood up your own job management server),
> >> > >> or Gradle commands (which will spin up runner instances for you). Documentation for executing tests and adding new ones
> >> > >> is on the wiki. [2] They are accessible to Go developers as they're implemented with the standard Go testing tools.
> >> > >>
> >> > >> Shortcomings
> >> > >> That said, there's still much to do. Let me briefly tell you what doesn't work, and it's up to you to weigh whether they block
> >> > >> being out of experimental.
> >> > >>
> >> > >> At present, only a textio has been implemented as Splittable DoFn.
> >> > >> Once the Kafka wrapper is merged in, it will serve as a the first example for future contributions for
> >> > >> new transform wrappers for the Go SDK.
> >> > >> Transforms and IOs are lacking, but at this point users are empowered to write their own DoFns or wrap existing transforms for Cross Language use.
> >> > >>
> >> > >> In the core SDK, more streaming focused features have yet to be implemented, but they're largely additions to what exists already
> >> > >> rather than total rebuilds. Much of the work is definining how a user specifies their desires, and turning those into the appropriate
> >> > >> FnAPI requests at execution time. Back in October I wrote at length on the wiki [1] what's missing for additional streaming features.
> >> > >>
> >> > >> While we have bolstered our testing recently, there's likely still more we could test to improve our confidence in the SDK,
> >> > >> in particular regarding the included transforms libraries and examples.
> >> > >>
> >> > >> Moving Forward
> >> > >>
> >> > >> My immediate plan is to work on incorporating the Go SDK fully into the Beam Programming Guide. I've audited the guide [3], and
> >> > >> am beginning to add missing content and filling in the Go specific gaps. This will be tied to improving the Go Doc with more Go
> >> > >> specific user documentation that isn't appropriate for the BPG.
> >> > >> And resolving the LICENSE issue around the public display of that GoDoc.
> >> > >>
> >> > >> If this proposal is accepted by a binding vote, I will incorporate the SDK into the release process, and remove the "experimental"
> >> > >> language around the SDK. This largely entails updating the release scripts to also build and publish the Go SDK Docker containers.
> >> > >> As for releasing the code, we're technically already doing so whenever we tag a release branch [4].
> >> > >>
> >> > >> The clearest signal to the Go community however will be migrating the SDK to use Go Modules for dependency version control,
> >> > >> which Daniel is planning on working on after his Kafka task. This will put our repo infrastructure, SDK contributors, and users
> >> > >> on the same footing when it comes to dependency management. It will remove the "+incompatible" tags one sees on the
> >> > >> pkg.go.dev list at [4].
> >> > >>
> >> > >> I'm very happy to answer any questions you might have about the SDK, and provide additional links as needed. I intentionally avoided
> >> > >> a link barrage in this email, as they can distract from the point: The SDK is ready for folks to use it, we need to tell them that they can
> >> > >> rather than they shouldn't.
> >> > >>
> >> > >> Robert Burke
> >> > >> Defacto Beam Go TL
> >> > >>
> >> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> >> > >> [1] https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> >> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> >> > >> [3] https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 (SDK Audit sheet)
> >> > >> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> >> >

Re: [Proposal] Go SDK Exits Experimental

Posted by Ismaël Mejía <ie...@gmail.com>.
Huge +1

This is definitely something many people have asked about, so it is
great to see it finally happening.

On Wed, Jun 16, 2021 at 7:56 PM Kenneth Knowles <ke...@apache.org> wrote:
>
> +1 awesome
>
> On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lo...@apache.org> wrote:
>>
>> Sounds reasonable to me. I agree. We'll aim to get those (Go modules and LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut if release images aren't added to the 2.32 process.
>>
>> Regarding Go Generics: at some point in the future, we may want a harder break between a newer Generic first API and and the current version, but there's no rush. Generics/TypeParameters in Go aren't identical to the feature referred to by that term in Java, C++, Rust, etc, so it'll take a bit of time for that expertise to develop.
>>
>> However, by the current nature of Go, we had to have pretty sophisticated reflective analysis to handle DoFns and map them to their graph inputs. So, adding new helpers like a KV, emitter, and Iterator types, shouldn't be too difficult. Changing Go SDK internals to use generics (like the implementation of Stats DoFns like Min, Max, etc) would also be able to be made transparently to most users, and certainly any of the framework for execution time handling (the "worker's SDK harness") would be able to be cleaned up if need be. Finally, adding more sophisticated DoFn registration and code generation would be able to replace the optional code generator entirely, saving some users a `go generate` step, simplifying getting improved execution performance.
>>
>> Changing things like making a Type Parameterized PCollection, would be far more involved, as would trying to use some kind of Apply format. The lack of Method Overrides prevents the apply chaining approach. Or at least prevents it from working simply.
>>
>> Finally, Go Generics won't be available until Go 1.18, which isn't until next year. See https://blog.golang.org/generics-proposal for details.
>>
>> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling convention, leading to a modest performance improvement across the board.
>>
>> Cheers,
>> Robert Burke
>>
>> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com> wrote:
>> > +1 to declaring Golang support out of experimental once the Go Modules
>> > issues are solved. I don't think an SDK needs to support every feature
>> > to be accepted, especially now that we can do cross-language
>> > transforms, and Go definitely supports enough to be quite useful. (WRT
>> > streaming, my understanding is that Go supports the streaming model
>> > with windows and timestamps, and runs fine on a streaming runner, even
>> > if more advanced features like state and timers aren't yet available.)
>> >
>> > This is a great milestone.
>> >
>> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com> wrote:
>> > >
>> > > WOW! Big news.
>> > >
>> > > I'm supportive of leaving experimental status after Go Modules are completed and the LICENSE issue is resolved. I don't think that lacking streaming support is a blocker. The other thing I checked to see was if there were metrics available on metrics.beam.apache.org, specifically for measuring code health via post-commit over time, which there are and the passing test rate is high (Huzzah!). The one thing that surprised me from your summary is that when Go introduces generics it won't result in any backwards incompatible changes in Apache Beam. That's great news, but does it mean there will be a need to support both non-generic and generic APIs moving forward? It seems like generics will be introduced in the Go 1.17 release (optimistically) in August this year.
>> > >
>> > >
>> > >
>> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org> wrote:
>> > >>
>> > >> Hello Beam Community!
>> > >>
>> > >> I propose we stop calling the Apache Beam Go SDK experimental.
>> > >>
>> > >> This thread is to discuss it as a community, and any conditions that remain that would prevent the exit.
>> > >>
>> > >> tl;dr;
>> > >> Ask Questions for answers and links! I have both.
>> > >> This entails including it officially in the Release process, removing the various "experimental" text throughout the repo etc,
>> > >> and otherwise treating it like Python and Java. Some Go specific tasks around dep versioning.
>> > >>
>> > >> The Go SDK implements the beam model efficiently for most batch tasks, including basic windowing.
>> > >> Apache Beam Go jobs can execute, and are tested on all Portable runners.
>> > >> The core APIs are not going to change in incompatible ways going forward.
>> > >> Scalable transforms can be written through SplittableDoFns or via Cross Language transforms.
>> > >>
>> > >> The SDK isn't 100% feature complete, but keeping it experimental doesn't help with that any further.
>> > >> Communities grow through contributions and use, and experimental markers dissuade users.
>> > >> There's plenty to do in order expand what can be done with the SDK. (Contributions welcome)
>> > >>
>> > >> Why Exit Experimental now?
>> > >>
>> > >> Typically when we call an SDK or API Experimental, it's because there's a risk that API or behaviors may change significantly.
>> > >> This in turn, leads to additional work for users of the SDK on every release which leads to sticking to older versions or forking
>> > >> to preserve behavior. Version updates should be looked forward to, and viewed as having little risk. Further while there's been
>> > >> previous dicussion about what the "low bar" is for a new SDK, it hasn't been summarily applied to the Go SDK. I feel this has
>> > >> hurt development and contribution of new SDK languages (inherent difficulty of SDK development notwithstanding).
>> > >>
>> > >> When the SDK was designed, it wasn't entirely clear what the Beam Model should look like in an opinionated language like Go.
>> > >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it means for a language without
>> > >> Generics, or overloading, or inheritance to implement the beam model. One could largely throw away static types (like Python),
>> > >> but this approach rings hollow for Go. It would not do if the approach couldn't grow and scale to the Beam Model. It's also hard
>> > >> to tell if an API is any good before there are users.
>> > >>
>> > >> Further, in the early days of Portability, there wasn't a way to write scalable DoFns, dynamically or otherwise. It's an incredible
>> > >> bottleneck to need to do all initial fanout of work on a single machine, write everything to a Reshuffle, just in order to scale up.
>> > >> Without being able to scale, Beam is little more than overhead.
>> > >>
>> > >> At this point, both of these needs are met within the Go SDK for open source.
>> > >>
>> > >> Background
>> > >>
>> > >> The Go SDK has been a part of the beam repo for a few years now, since it was accidentally merged into master.
>> > >> Since then it's been called experimental, and not officially part of the releases.
>> > >>
>> > >> Of the SDKs, it's was always designed around Beam Portability first. It never had any "Legacy" (SDK x Runner specific ) workers.
>> > >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first with some very experimental code on Dataflow, but now
>> > >> on all portable supported runners, like Flink, Spark, the Python Portable runner, and Dataflow.
>> > >>
>> > >> API Stability
>> > >>
>> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline construction since it was first merged in, and there are no
>> > >> changes to that on the horizon that can't be made in a backwards compatible manner. Largely these are related to New Features, or
>> > >> usability improvements enabled by the advent of Go Generics (think of "real" KV, emitter, and iterator types).
>> > >>
>> > >> It's an open secret that the Go SDK has largely been under work for use within Google. It's use is called FlumeGo, representing
>> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline processing engine. Thus most of the focus on improving
>> > >> batch execution. FlumeGo sees ample use today, and there hasn't been a call for fundamental changes to the API for ergonomic or
>> > >> usability concerns.
>> > >>
>> > >> Scalability
>> > >>
>> > >> Google could get away without the Go SDK having an SDK side scalability solution as a result of it's integration with Flume.
>> > >> However, those days are now past.
>> > >>
>> > >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which supports writing scalable batch transforms natively
>> > >> in the Go SDK.
>> > >> The SDK also supports Cross Language Transforms, with Beam Schema encodings. With it, production hardened transforms
>> > >> from Java and Python are a wrapper away.
>> > >>
>> > >> Presently, Daniel Oliveira (who implemented the SDF side work, and completed the Xlang work,) is adding a wrapper for the
>> > >> Java Kafka IO using Cross Language Transforms, which is often been requested. This will also enable use of the Beam SQL
>> > >> transforms that java enables.
>> > >>
>> > >> Features
>> > >>
>> > >> The Go SDK implements the Beam C=core. The Go SDK implements standard coders, allows for user DoFns, and CombineFns and access
>> > >> to core transforms like Flatten, GroupByKey, and features like Side Inputs, Windowing, and User Metrics.
>> > >> Basic windowing will be fully supported for batch even through lifted combines in the 2.32.0 release.
>> > >>
>> > >> All of the above enables Beam Go to be versatile for batch execution on portable runners, and for simple streaming pipelines.
>> > >>
>> > >> Repo Testing
>> > >>
>> > >> On precommit the Go SDK runs all it's unit tests. On top of that, it runs all it's integration tests against the Python Portable runner,
>> > >> making it quick and robust to detect breaking changes without overspending community resources. Those same tests are also
>> > >> run against Dataflow, Flink, and Spark.
>> > >>
>> > >> The tests are executable against all runners via the appropriate Go commands (if you've stood up your own job management server),
>> > >> or Gradle commands (which will spin up runner instances for you). Documentation for executing tests and adding new ones
>> > >> is on the wiki. [2] They are accessible to Go developers as they're implemented with the standard Go testing tools.
>> > >>
>> > >> Shortcomings
>> > >> That said, there's still much to do. Let me briefly tell you what doesn't work, and it's up to you to weigh whether they block
>> > >> being out of experimental.
>> > >>
>> > >> At present, only a textio has been implemented as Splittable DoFn.
>> > >> Once the Kafka wrapper is merged in, it will serve as a the first example for future contributions for
>> > >> new transform wrappers for the Go SDK.
>> > >> Transforms and IOs are lacking, but at this point users are empowered to write their own DoFns or wrap existing transforms for Cross Language use.
>> > >>
>> > >> In the core SDK, more streaming focused features have yet to be implemented, but they're largely additions to what exists already
>> > >> rather than total rebuilds. Much of the work is definining how a user specifies their desires, and turning those into the appropriate
>> > >> FnAPI requests at execution time. Back in October I wrote at length on the wiki [1] what's missing for additional streaming features.
>> > >>
>> > >> While we have bolstered our testing recently, there's likely still more we could test to improve our confidence in the SDK,
>> > >> in particular regarding the included transforms libraries and examples.
>> > >>
>> > >> Moving Forward
>> > >>
>> > >> My immediate plan is to work on incorporating the Go SDK fully into the Beam Programming Guide. I've audited the guide [3], and
>> > >> am beginning to add missing content and filling in the Go specific gaps. This will be tied to improving the Go Doc with more Go
>> > >> specific user documentation that isn't appropriate for the BPG.
>> > >> And resolving the LICENSE issue around the public display of that GoDoc.
>> > >>
>> > >> If this proposal is accepted by a binding vote, I will incorporate the SDK into the release process, and remove the "experimental"
>> > >> language around the SDK. This largely entails updating the release scripts to also build and publish the Go SDK Docker containers.
>> > >> As for releasing the code, we're technically already doing so whenever we tag a release branch [4].
>> > >>
>> > >> The clearest signal to the Go community however will be migrating the SDK to use Go Modules for dependency version control,
>> > >> which Daniel is planning on working on after his Kafka task. This will put our repo infrastructure, SDK contributors, and users
>> > >> on the same footing when it comes to dependency management. It will remove the "+incompatible" tags one sees on the
>> > >> pkg.go.dev list at [4].
>> > >>
>> > >> I'm very happy to answer any questions you might have about the SDK, and provide additional links as needed. I intentionally avoided
>> > >> a link barrage in this email, as they can distract from the point: The SDK is ready for folks to use it, we need to tell them that they can
>> > >> rather than they shouldn't.
>> > >>
>> > >> Robert Burke
>> > >> Defacto Beam Go TL
>> > >>
>> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
>> > >> [1] https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>> > >> [3] https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 (SDK Audit sheet)
>> > >> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>> >

Re: [Proposal] Go SDK Exits Experimental

Posted by Kenneth Knowles <ke...@apache.org>.
+1 awesome

On Wed, Jun 16, 2021 at 10:33 AM Robert Burke <lo...@apache.org> wrote:

> Sounds reasonable to me. I agree. We'll aim to get those (Go modules and
> LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut
> if release images aren't added to the 2.32 process.
>
> Regarding Go Generics: at some point in the future, we may want a harder
> break between a newer Generic first API and and the current version, but
> there's no rush. Generics/TypeParameters in Go aren't identical to the
> feature referred to by that term in Java, C++, Rust, etc, so it'll take a
> bit of time for that expertise to develop.
>
> However, by the current nature of Go, we had to have pretty sophisticated
> reflective analysis to handle DoFns and map them to their graph inputs. So,
> adding new helpers like a KV, emitter, and Iterator types, shouldn't be too
> difficult. Changing Go SDK internals to use generics (like the
> implementation of Stats DoFns like Min, Max, etc) would also be able to be
> made transparently to most users, and certainly any of the framework for
> execution time handling (the "worker's SDK harness") would be able to be
> cleaned up if need be. Finally, adding more sophisticated DoFn registration
> and code generation would be able to replace the optional code generator
> entirely, saving some users a `go generate` step, simplifying getting
> improved execution performance.
>
> Changing things like making a Type Parameterized PCollection, would be far
> more involved, as would trying to use some kind of Apply format. The lack
> of Method Overrides prevents the apply chaining approach. Or at least
> prevents it from working simply.
>
> Finally, Go Generics won't be available until Go 1.18, which isn't until
> next year. See https://blog.golang.org/generics-proposal for details.
>
> Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling
> convention, leading to a modest performance improvement across the board.
>
> Cheers,
> Robert Burke
>
> On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com> wrote:
> > +1 to declaring Golang support out of experimental once the Go Modules
> > issues are solved. I don't think an SDK needs to support every feature
> > to be accepted, especially now that we can do cross-language
> > transforms, and Go definitely supports enough to be quite useful. (WRT
> > streaming, my understanding is that Go supports the streaming model
> > with windows and timestamps, and runs fine on a streaming runner, even
> > if more advanced features like state and timers aren't yet available.)
> >
> > This is a great milestone.
> >
> > On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com>
> wrote:
> > >
> > > WOW! Big news.
> > >
> > > I'm supportive of leaving experimental status after Go Modules are
> completed and the LICENSE issue is resolved. I don't think that lacking
> streaming support is a blocker. The other thing I checked to see was if
> there were metrics available on metrics.beam.apache.org, specifically for
> measuring code health via post-commit over time, which there are and the
> passing test rate is high (Huzzah!). The one thing that surprised me from
> your summary is that when Go introduces generics it won't result in any
> backwards incompatible changes in Apache Beam. That's great news, but does
> it mean there will be a need to support both non-generic and generic APIs
> moving forward? It seems like generics will be introduced in the Go 1.17
> release (optimistically) in August this year.
> > >
> > >
> > >
> > > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org>
> wrote:
> > >>
> > >> Hello Beam Community!
> > >>
> > >> I propose we stop calling the Apache Beam Go SDK experimental.
> > >>
> > >> This thread is to discuss it as a community, and any conditions that
> remain that would prevent the exit.
> > >>
> > >> tl;dr;
> > >> Ask Questions for answers and links! I have both.
> > >> This entails including it officially in the Release process, removing
> the various "experimental" text throughout the repo etc,
> > >> and otherwise treating it like Python and Java. Some Go specific
> tasks around dep versioning.
> > >>
> > >> The Go SDK implements the beam model efficiently for most batch
> tasks, including basic windowing.
> > >> Apache Beam Go jobs can execute, and are tested on all Portable
> runners.
> > >> The core APIs are not going to change in incompatible ways going
> forward.
> > >> Scalable transforms can be written through SplittableDoFns or via
> Cross Language transforms.
> > >>
> > >> The SDK isn't 100% feature complete, but keeping it experimental
> doesn't help with that any further.
> > >> Communities grow through contributions and use, and experimental
> markers dissuade users.
> > >> There's plenty to do in order expand what can be done with the SDK.
> (Contributions welcome)
> > >>
> > >> Why Exit Experimental now?
> > >>
> > >> Typically when we call an SDK or API Experimental, it's because
> there's a risk that API or behaviors may change significantly.
> > >> This in turn, leads to additional work for users of the SDK on every
> release which leads to sticking to older versions or forking
> > >> to preserve behavior. Version updates should be looked forward to,
> and viewed as having little risk. Further while there's been
> > >> previous dicussion about what the "low bar" is for a new SDK, it
> hasn't been summarily applied to the Go SDK. I feel this has
> > >> hurt development and contribution of new SDK languages (inherent
> difficulty of SDK development notwithstanding).
> > >>
> > >> When the SDK was designed, it wasn't entirely clear what the Beam
> Model should look like in an opinionated language like Go.
> > >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc
> [0]) goes into detail what it means for a language without
> > >> Generics, or overloading, or inheritance to implement the beam model.
> One could largely throw away static types (like Python),
> > >> but this approach rings hollow for Go. It would not do if the
> approach couldn't grow and scale to the Beam Model. It's also hard
> > >> to tell if an API is any good before there are users.
> > >>
> > >> Further, in the early days of Portability, there wasn't a way to
> write scalable DoFns, dynamically or otherwise. It's an incredible
> > >> bottleneck to need to do all initial fanout of work on a single
> machine, write everything to a Reshuffle, just in order to scale up.
> > >> Without being able to scale, Beam is little more than overhead.
> > >>
> > >> At this point, both of these needs are met within the Go SDK for open
> source.
> > >>
> > >> Background
> > >>
> > >> The Go SDK has been a part of the beam repo for a few years now,
> since it was accidentally merged into master.
> > >> Since then it's been called experimental, and not officially part of
> the releases.
> > >>
> > >> Of the SDKs, it's was always designed around Beam Portability first.
> It never had any "Legacy" (SDK x Runner specific ) workers.
> > >> It's always used the Beam Pipeline protos and FnAPI to execute jobs,
> first with some very experimental code on Dataflow, but now
> > >> on all portable supported runners, like Flink, Spark, the Python
> Portable runner, and Dataflow.
> > >>
> > >> API Stability
> > >>
> > >> The Go SDK hasn't meaningfully changed it's user API for DoFn and
> pipeline construction since it was first merged in, and there are no
> > >> changes to that on the horizon that can't be made in a backwards
> compatible manner. Largely these are related to New Features, or
> > >> usability improvements enabled by the advent of Go Generics (think of
> "real" KV, emitter, and iterator types).
> > >>
> > >> It's an open secret that the Go SDK has largely been under work for
> use within Google. It's use is called FlumeGo, representing
> > >> the Apache Beam Go SDK, running on top of Flume, Google's batch
> pipeline processing engine. Thus most of the focus on improving
> > >> batch execution. FlumeGo sees ample use today, and there hasn't been
> a call for fundamental changes to the API for ergonomic or
> > >> usability concerns.
> > >>
> > >> Scalability
> > >>
> > >> Google could get away without the Go SDK having an SDK side
> scalability solution as a result of it's integration with Flume.
> > >> However, those days are now past.
> > >>
> > >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting,
> which supports writing scalable batch transforms natively
> > >> in the Go SDK.
> > >> The SDK also supports Cross Language Transforms, with Beam Schema
> encodings. With it, production hardened transforms
> > >> from Java and Python are a wrapper away.
> > >>
> > >> Presently, Daniel Oliveira (who implemented the SDF side work, and
> completed the Xlang work,) is adding a wrapper for the
> > >> Java Kafka IO using Cross Language Transforms, which is often been
> requested. This will also enable use of the Beam SQL
> > >> transforms that java enables.
> > >>
> > >> Features
> > >>
> > >> The Go SDK implements the Beam C=core. The Go SDK implements standard
> coders, allows for user DoFns, and CombineFns and access
> > >> to core transforms like Flatten, GroupByKey, and features like Side
> Inputs, Windowing, and User Metrics.
> > >> Basic windowing will be fully supported for batch even through lifted
> combines in the 2.32.0 release.
> > >>
> > >> All of the above enables Beam Go to be versatile for batch execution
> on portable runners, and for simple streaming pipelines.
> > >>
> > >> Repo Testing
> > >>
> > >> On precommit the Go SDK runs all it's unit tests. On top of that, it
> runs all it's integration tests against the Python Portable runner,
> > >> making it quick and robust to detect breaking changes without
> overspending community resources. Those same tests are also
> > >> run against Dataflow, Flink, and Spark.
> > >>
> > >> The tests are executable against all runners via the appropriate Go
> commands (if you've stood up your own job management server),
> > >> or Gradle commands (which will spin up runner instances for you).
> Documentation for executing tests and adding new ones
> > >> is on the wiki. [2] They are accessible to Go developers as they're
> implemented with the standard Go testing tools.
> > >>
> > >> Shortcomings
> > >> That said, there's still much to do. Let me briefly tell you what
> doesn't work, and it's up to you to weigh whether they block
> > >> being out of experimental.
> > >>
> > >> At present, only a textio has been implemented as Splittable DoFn.
> > >> Once the Kafka wrapper is merged in, it will serve as a the first
> example for future contributions for
> > >> new transform wrappers for the Go SDK.
> > >> Transforms and IOs are lacking, but at this point users are empowered
> to write their own DoFns or wrap existing transforms for Cross Language use.
> > >>
> > >> In the core SDK, more streaming focused features have yet to be
> implemented, but they're largely additions to what exists already
> > >> rather than total rebuilds. Much of the work is definining how a user
> specifies their desires, and turning those into the appropriate
> > >> FnAPI requests at execution time. Back in October I wrote at length
> on the wiki [1] what's missing for additional streaming features.
> > >>
> > >> While we have bolstered our testing recently, there's likely still
> more we could test to improve our confidence in the SDK,
> > >> in particular regarding the included transforms libraries and
> examples.
> > >>
> > >> Moving Forward
> > >>
> > >> My immediate plan is to work on incorporating the Go SDK fully into
> the Beam Programming Guide. I've audited the guide [3], and
> > >> am beginning to add missing content and filling in the Go specific
> gaps. This will be tied to improving the Go Doc with more Go
> > >> specific user documentation that isn't appropriate for the BPG.
> > >> And resolving the LICENSE issue around the public display of that
> GoDoc.
> > >>
> > >> If this proposal is accepted by a binding vote, I will incorporate
> the SDK into the release process, and remove the "experimental"
> > >> language around the SDK. This largely entails updating the release
> scripts to also build and publish the Go SDK Docker containers.
> > >> As for releasing the code, we're technically already doing so
> whenever we tag a release branch [4].
> > >>
> > >> The clearest signal to the Go community however will be migrating the
> SDK to use Go Modules for dependency version control,
> > >> which Daniel is planning on working on after his Kafka task. This
> will put our repo infrastructure, SDK contributors, and users
> > >> on the same footing when it comes to dependency management. It will
> remove the "+incompatible" tags one sees on the
> > >> pkg.go.dev list at [4].
> > >>
> > >> I'm very happy to answer any questions you might have about the SDK,
> and provide additional links as needed. I intentionally avoided
> > >> a link barrage in this email, as they can distract from the point:
> The SDK is ready for folks to use it, we need to tell them that they can
> > >> rather than they shouldn't.
> > >>
> > >> Robert Burke
> > >> Defacto Beam Go TL
> > >>
> > >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> > >> [1]
> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> > >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> > >> [3]
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> (SDK Audit sheet)
> > >> [4]
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> >
>

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Burke <lo...@apache.org>.
Sounds reasonable to me. I agree. We'll aim to get those (Go modules and LICENSE issue) done before the 2.32 cut, and certainly before the 2.33 cut if release images aren't added to the 2.32 process.

Regarding Go Generics: at some point in the future, we may want a harder break between a newer Generic first API and and the current version, but there's no rush. Generics/TypeParameters in Go aren't identical to the feature referred to by that term in Java, C++, Rust, etc, so it'll take a bit of time for that expertise to develop.

However, by the current nature of Go, we had to have pretty sophisticated reflective analysis to handle DoFns and map them to their graph inputs. So, adding new helpers like a KV, emitter, and Iterator types, shouldn't be too difficult. Changing Go SDK internals to use generics (like the implementation of Stats DoFns like Min, Max, etc) would also be able to be made transparently to most users, and certainly any of the framework for execution time handling (the "worker's SDK harness") would be able to be cleaned up if need be. Finally, adding more sophisticated DoFn registration and code generation would be able to replace the optional code generator entirely, saving some users a `go generate` step, simplifying getting improved execution performance.

Changing things like making a Type Parameterized PCollection, would be far more involved, as would trying to use some kind of Apply format. The lack of Method Overrides prevents the apply chaining approach. Or at least prevents it from working simply.

Finally, Go Generics won't be available until Go 1.18, which isn't until next year. See https://blog.golang.org/generics-proposal for details. 

Go 1.17 https://tip.golang.org/doc/go1.17 does include a Register calling convention, leading to a modest performance improvement across the board.

Cheers,
Robert Burke

On 2021/06/15 18:10:46, Robert Bradshaw <ro...@google.com> wrote: 
> +1 to declaring Golang support out of experimental once the Go Modules
> issues are solved. I don't think an SDK needs to support every feature
> to be accepted, especially now that we can do cross-language
> transforms, and Go definitely supports enough to be quite useful. (WRT
> streaming, my understanding is that Go supports the streaming model
> with windows and timestamps, and runs fine on a streaming runner, even
> if more advanced features like state and timers aren't yet available.)
> 
> This is a great milestone.
> 
> On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com> wrote:
> >
> > WOW! Big news.
> >
> > I'm supportive of leaving experimental status after Go Modules are completed and the LICENSE issue is resolved. I don't think that lacking streaming support is a blocker. The other thing I checked to see was if there were metrics available on metrics.beam.apache.org, specifically for measuring code health via post-commit over time, which there are and the passing test rate is high (Huzzah!). The one thing that surprised me from your summary is that when Go introduces generics it won't result in any backwards incompatible changes in Apache Beam. That's great news, but does it mean there will be a need to support both non-generic and generic APIs moving forward? It seems like generics will be introduced in the Go 1.17 release (optimistically) in August this year.
> >
> >
> >
> > On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org> wrote:
> >>
> >> Hello Beam Community!
> >>
> >> I propose we stop calling the Apache Beam Go SDK experimental.
> >>
> >> This thread is to discuss it as a community, and any conditions that remain that would prevent the exit.
> >>
> >> tl;dr;
> >> Ask Questions for answers and links! I have both.
> >> This entails including it officially in the Release process, removing the various "experimental" text throughout the repo etc,
> >> and otherwise treating it like Python and Java. Some Go specific tasks around dep versioning.
> >>
> >> The Go SDK implements the beam model efficiently for most batch tasks, including basic windowing.
> >> Apache Beam Go jobs can execute, and are tested on all Portable runners.
> >> The core APIs are not going to change in incompatible ways going forward.
> >> Scalable transforms can be written through SplittableDoFns or via Cross Language transforms.
> >>
> >> The SDK isn't 100% feature complete, but keeping it experimental doesn't help with that any further.
> >> Communities grow through contributions and use, and experimental markers dissuade users.
> >> There's plenty to do in order expand what can be done with the SDK. (Contributions welcome)
> >>
> >> Why Exit Experimental now?
> >>
> >> Typically when we call an SDK or API Experimental, it's because there's a risk that API or behaviors may change significantly.
> >> This in turn, leads to additional work for users of the SDK on every release which leads to sticking to older versions or forking
> >> to preserve behavior. Version updates should be looked forward to, and viewed as having little risk. Further while there's been
> >> previous dicussion about what the "low bar" is for a new SDK, it hasn't been summarily applied to the Go SDK. I feel this has
> >> hurt development and contribution of new SDK languages (inherent difficulty of SDK development notwithstanding).
> >>
> >> When the SDK was designed, it wasn't entirely clear what the Beam Model should look like in an opinionated language like Go.
> >> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it means for a language without
> >> Generics, or overloading, or inheritance to implement the beam model. One could largely throw away static types (like Python),
> >> but this approach rings hollow for Go. It would not do if the approach couldn't grow and scale to the Beam Model. It's also hard
> >> to tell if an API is any good before there are users.
> >>
> >> Further, in the early days of Portability, there wasn't a way to write scalable DoFns, dynamically or otherwise. It's an incredible
> >> bottleneck to need to do all initial fanout of work on a single machine, write everything to a Reshuffle, just in order to scale up.
> >> Without being able to scale, Beam is little more than overhead.
> >>
> >> At this point, both of these needs are met within the Go SDK for open source.
> >>
> >> Background
> >>
> >> The Go SDK has been a part of the beam repo for a few years now, since it was accidentally merged into master.
> >> Since then it's been called experimental, and not officially part of the releases.
> >>
> >> Of the SDKs, it's was always designed around Beam Portability first. It never had any "Legacy" (SDK x Runner specific ) workers.
> >> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first with some very experimental code on Dataflow, but now
> >> on all portable supported runners, like Flink, Spark, the Python Portable runner, and Dataflow.
> >>
> >> API Stability
> >>
> >> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline construction since it was first merged in, and there are no
> >> changes to that on the horizon that can't be made in a backwards compatible manner. Largely these are related to New Features, or
> >> usability improvements enabled by the advent of Go Generics (think of "real" KV, emitter, and iterator types).
> >>
> >> It's an open secret that the Go SDK has largely been under work for use within Google. It's use is called FlumeGo, representing
> >> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline processing engine. Thus most of the focus on improving
> >> batch execution. FlumeGo sees ample use today, and there hasn't been a call for fundamental changes to the API for ergonomic or
> >> usability concerns.
> >>
> >> Scalability
> >>
> >> Google could get away without the Go SDK having an SDK side scalability solution as a result of it's integration with Flume.
> >> However, those days are now past.
> >>
> >> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which supports writing scalable batch transforms natively
> >> in the Go SDK.
> >> The SDK also supports Cross Language Transforms, with Beam Schema encodings. With it, production hardened transforms
> >> from Java and Python are a wrapper away.
> >>
> >> Presently, Daniel Oliveira (who implemented the SDF side work, and completed the Xlang work,) is adding a wrapper for the
> >> Java Kafka IO using Cross Language Transforms, which is often been requested. This will also enable use of the Beam SQL
> >> transforms that java enables.
> >>
> >> Features
> >>
> >> The Go SDK implements the Beam C=core. The Go SDK implements standard coders, allows for user DoFns, and CombineFns and access
> >> to core transforms like Flatten, GroupByKey, and features like Side Inputs, Windowing, and User Metrics.
> >> Basic windowing will be fully supported for batch even through lifted combines in the 2.32.0 release.
> >>
> >> All of the above enables Beam Go to be versatile for batch execution on portable runners, and for simple streaming pipelines.
> >>
> >> Repo Testing
> >>
> >> On precommit the Go SDK runs all it's unit tests. On top of that, it runs all it's integration tests against the Python Portable runner,
> >> making it quick and robust to detect breaking changes without overspending community resources. Those same tests are also
> >> run against Dataflow, Flink, and Spark.
> >>
> >> The tests are executable against all runners via the appropriate Go commands (if you've stood up your own job management server),
> >> or Gradle commands (which will spin up runner instances for you). Documentation for executing tests and adding new ones
> >> is on the wiki. [2] They are accessible to Go developers as they're implemented with the standard Go testing tools.
> >>
> >> Shortcomings
> >> That said, there's still much to do. Let me briefly tell you what doesn't work, and it's up to you to weigh whether they block
> >> being out of experimental.
> >>
> >> At present, only a textio has been implemented as Splittable DoFn.
> >> Once the Kafka wrapper is merged in, it will serve as a the first example for future contributions for
> >> new transform wrappers for the Go SDK.
> >> Transforms and IOs are lacking, but at this point users are empowered to write their own DoFns or wrap existing transforms for Cross Language use.
> >>
> >> In the core SDK, more streaming focused features have yet to be implemented, but they're largely additions to what exists already
> >> rather than total rebuilds. Much of the work is definining how a user specifies their desires, and turning those into the appropriate
> >> FnAPI requests at execution time. Back in October I wrote at length on the wiki [1] what's missing for additional streaming features.
> >>
> >> While we have bolstered our testing recently, there's likely still more we could test to improve our confidence in the SDK,
> >> in particular regarding the included transforms libraries and examples.
> >>
> >> Moving Forward
> >>
> >> My immediate plan is to work on incorporating the Go SDK fully into the Beam Programming Guide. I've audited the guide [3], and
> >> am beginning to add missing content and filling in the Go specific gaps. This will be tied to improving the Go Doc with more Go
> >> specific user documentation that isn't appropriate for the BPG.
> >> And resolving the LICENSE issue around the public display of that GoDoc.
> >>
> >> If this proposal is accepted by a binding vote, I will incorporate the SDK into the release process, and remove the "experimental"
> >> language around the SDK. This largely entails updating the release scripts to also build and publish the Go SDK Docker containers.
> >> As for releasing the code, we're technically already doing so whenever we tag a release branch [4].
> >>
> >> The clearest signal to the Go community however will be migrating the SDK to use Go Modules for dependency version control,
> >> which Daniel is planning on working on after his Kafka task. This will put our repo infrastructure, SDK contributors, and users
> >> on the same footing when it comes to dependency management. It will remove the "+incompatible" tags one sees on the
> >> pkg.go.dev list at [4].
> >>
> >> I'm very happy to answer any questions you might have about the SDK, and provide additional links as needed. I intentionally avoided
> >> a link barrage in this email, as they can distract from the point: The SDK is ready for folks to use it, we need to tell them that they can
> >> rather than they shouldn't.
> >>
> >> Robert Burke
> >> Defacto Beam Go TL
> >>
> >> [0] https://s.apache.org/beam-go-sdk-design-rfc
> >> [1] https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> >> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> >> [3] https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 (SDK Audit sheet)
> >> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
> 

Re: [Proposal] Go SDK Exits Experimental

Posted by Robert Bradshaw <ro...@google.com>.
+1 to declaring Golang support out of experimental once the Go Modules
issues are solved. I don't think an SDK needs to support every feature
to be accepted, especially now that we can do cross-language
transforms, and Go definitely supports enough to be quite useful. (WRT
streaming, my understanding is that Go supports the streaming model
with windows and timestamps, and runs fine on a streaming runner, even
if more advanced features like state and timers aren't yet available.)

This is a great milestone.

On Tue, Jun 15, 2021 at 10:12 AM Tyson Hamilton <ty...@google.com> wrote:
>
> WOW! Big news.
>
> I'm supportive of leaving experimental status after Go Modules are completed and the LICENSE issue is resolved. I don't think that lacking streaming support is a blocker. The other thing I checked to see was if there were metrics available on metrics.beam.apache.org, specifically for measuring code health via post-commit over time, which there are and the passing test rate is high (Huzzah!). The one thing that surprised me from your summary is that when Go introduces generics it won't result in any backwards incompatible changes in Apache Beam. That's great news, but does it mean there will be a need to support both non-generic and generic APIs moving forward? It seems like generics will be introduced in the Go 1.17 release (optimistically) in August this year.
>
>
>
> On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org> wrote:
>>
>> Hello Beam Community!
>>
>> I propose we stop calling the Apache Beam Go SDK experimental.
>>
>> This thread is to discuss it as a community, and any conditions that remain that would prevent the exit.
>>
>> tl;dr;
>> Ask Questions for answers and links! I have both.
>> This entails including it officially in the Release process, removing the various "experimental" text throughout the repo etc,
>> and otherwise treating it like Python and Java. Some Go specific tasks around dep versioning.
>>
>> The Go SDK implements the beam model efficiently for most batch tasks, including basic windowing.
>> Apache Beam Go jobs can execute, and are tested on all Portable runners.
>> The core APIs are not going to change in incompatible ways going forward.
>> Scalable transforms can be written through SplittableDoFns or via Cross Language transforms.
>>
>> The SDK isn't 100% feature complete, but keeping it experimental doesn't help with that any further.
>> Communities grow through contributions and use, and experimental markers dissuade users.
>> There's plenty to do in order expand what can be done with the SDK. (Contributions welcome)
>>
>> Why Exit Experimental now?
>>
>> Typically when we call an SDK or API Experimental, it's because there's a risk that API or behaviors may change significantly.
>> This in turn, leads to additional work for users of the SDK on every release which leads to sticking to older versions or forking
>> to preserve behavior. Version updates should be looked forward to, and viewed as having little risk. Further while there's been
>> previous dicussion about what the "low bar" is for a new SDK, it hasn't been summarily applied to the Go SDK. I feel this has
>> hurt development and contribution of new SDK languages (inherent difficulty of SDK development notwithstanding).
>>
>> When the SDK was designed, it wasn't entirely clear what the Beam Model should look like in an opinionated language like Go.
>> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0]) goes into detail what it means for a language without
>> Generics, or overloading, or inheritance to implement the beam model. One could largely throw away static types (like Python),
>> but this approach rings hollow for Go. It would not do if the approach couldn't grow and scale to the Beam Model. It's also hard
>> to tell if an API is any good before there are users.
>>
>> Further, in the early days of Portability, there wasn't a way to write scalable DoFns, dynamically or otherwise. It's an incredible
>> bottleneck to need to do all initial fanout of work on a single machine, write everything to a Reshuffle, just in order to scale up.
>> Without being able to scale, Beam is little more than overhead.
>>
>> At this point, both of these needs are met within the Go SDK for open source.
>>
>> Background
>>
>> The Go SDK has been a part of the beam repo for a few years now, since it was accidentally merged into master.
>> Since then it's been called experimental, and not officially part of the releases.
>>
>> Of the SDKs, it's was always designed around Beam Portability first. It never had any "Legacy" (SDK x Runner specific ) workers.
>> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first with some very experimental code on Dataflow, but now
>> on all portable supported runners, like Flink, Spark, the Python Portable runner, and Dataflow.
>>
>> API Stability
>>
>> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline construction since it was first merged in, and there are no
>> changes to that on the horizon that can't be made in a backwards compatible manner. Largely these are related to New Features, or
>> usability improvements enabled by the advent of Go Generics (think of "real" KV, emitter, and iterator types).
>>
>> It's an open secret that the Go SDK has largely been under work for use within Google. It's use is called FlumeGo, representing
>> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline processing engine. Thus most of the focus on improving
>> batch execution. FlumeGo sees ample use today, and there hasn't been a call for fundamental changes to the API for ergonomic or
>> usability concerns.
>>
>> Scalability
>>
>> Google could get away without the Go SDK having an SDK side scalability solution as a result of it's integration with Flume.
>> However, those days are now past.
>>
>> The Go SDK now supports SplittableDoFns along with Dynamic Splitting, which supports writing scalable batch transforms natively
>> in the Go SDK.
>> The SDK also supports Cross Language Transforms, with Beam Schema encodings. With it, production hardened transforms
>> from Java and Python are a wrapper away.
>>
>> Presently, Daniel Oliveira (who implemented the SDF side work, and completed the Xlang work,) is adding a wrapper for the
>> Java Kafka IO using Cross Language Transforms, which is often been requested. This will also enable use of the Beam SQL
>> transforms that java enables.
>>
>> Features
>>
>> The Go SDK implements the Beam C=core. The Go SDK implements standard coders, allows for user DoFns, and CombineFns and access
>> to core transforms like Flatten, GroupByKey, and features like Side Inputs, Windowing, and User Metrics.
>> Basic windowing will be fully supported for batch even through lifted combines in the 2.32.0 release.
>>
>> All of the above enables Beam Go to be versatile for batch execution on portable runners, and for simple streaming pipelines.
>>
>> Repo Testing
>>
>> On precommit the Go SDK runs all it's unit tests. On top of that, it runs all it's integration tests against the Python Portable runner,
>> making it quick and robust to detect breaking changes without overspending community resources. Those same tests are also
>> run against Dataflow, Flink, and Spark.
>>
>> The tests are executable against all runners via the appropriate Go commands (if you've stood up your own job management server),
>> or Gradle commands (which will spin up runner instances for you). Documentation for executing tests and adding new ones
>> is on the wiki. [2] They are accessible to Go developers as they're implemented with the standard Go testing tools.
>>
>> Shortcomings
>> That said, there's still much to do. Let me briefly tell you what doesn't work, and it's up to you to weigh whether they block
>> being out of experimental.
>>
>> At present, only a textio has been implemented as Splittable DoFn.
>> Once the Kafka wrapper is merged in, it will serve as a the first example for future contributions for
>> new transform wrappers for the Go SDK.
>> Transforms and IOs are lacking, but at this point users are empowered to write their own DoFns or wrap existing transforms for Cross Language use.
>>
>> In the core SDK, more streaming focused features have yet to be implemented, but they're largely additions to what exists already
>> rather than total rebuilds. Much of the work is definining how a user specifies their desires, and turning those into the appropriate
>> FnAPI requests at execution time. Back in October I wrote at length on the wiki [1] what's missing for additional streaming features.
>>
>> While we have bolstered our testing recently, there's likely still more we could test to improve our confidence in the SDK,
>> in particular regarding the included transforms libraries and examples.
>>
>> Moving Forward
>>
>> My immediate plan is to work on incorporating the Go SDK fully into the Beam Programming Guide. I've audited the guide [3], and
>> am beginning to add missing content and filling in the Go specific gaps. This will be tied to improving the Go Doc with more Go
>> specific user documentation that isn't appropriate for the BPG.
>> And resolving the LICENSE issue around the public display of that GoDoc.
>>
>> If this proposal is accepted by a binding vote, I will incorporate the SDK into the release process, and remove the "experimental"
>> language around the SDK. This largely entails updating the release scripts to also build and publish the Go SDK Docker containers.
>> As for releasing the code, we're technically already doing so whenever we tag a release branch [4].
>>
>> The clearest signal to the Go community however will be migrating the SDK to use Go Modules for dependency version control,
>> which Daniel is planning on working on after his Kafka task. This will put our repo infrastructure, SDK contributors, and users
>> on the same footing when it comes to dependency management. It will remove the "+incompatible" tags one sees on the
>> pkg.go.dev list at [4].
>>
>> I'm very happy to answer any questions you might have about the SDK, and provide additional links as needed. I intentionally avoided
>> a link barrage in this email, as they can distract from the point: The SDK is ready for folks to use it, we need to tell them that they can
>> rather than they shouldn't.
>>
>> Robert Burke
>> Defacto Beam Go TL
>>
>> [0] https://s.apache.org/beam-go-sdk-design-rfc
>> [1] https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
>> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
>> [3] https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090 (SDK Audit sheet)
>> [4] https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions

Re: [Proposal] Go SDK Exits Experimental

Posted by Tyson Hamilton <ty...@google.com>.
WOW! Big news.

I'm supportive of leaving experimental status after Go Modules are
completed and the LICENSE issue is resolved. I don't think that lacking
streaming support is a blocker. The other thing I checked to see was if
there were metrics available on metrics.beam.apache.org, specifically for
measuring code health via post-commit over time, which there are and the
passing test rate is high (Huzzah!). The one thing that surprised me from
your summary is that when Go introduces generics it won't result in any
backwards incompatible changes in Apache Beam. That's great news, but does
it mean there will be a need to support both non-generic and generic APIs
moving forward? It seems like generics will be introduced in the Go 1.17
release (optimistically) in August this year.



On Thu, Jun 10, 2021 at 5:04 PM Robert Burke <lo...@apache.org> wrote:

> Hello Beam Community!
>
> I propose we stop calling the Apache Beam Go SDK experimental.
>
> This thread is to discuss it as a community, and any conditions that
> remain that would prevent the exit.
>
> *tl;dr;*
> *Ask Questions for answers and links! I have both.*
> This entails including it officially in the Release process, removing the
> various "experimental" text throughout the repo etc,
> and otherwise treating it like Python and Java. Some Go specific tasks
> around dep versioning.
>
> The Go SDK implements the beam model efficiently for most batch tasks,
> including basic windowing.
> Apache Beam Go jobs can execute, and are tested on all Portable runners.
> The core APIs are not going to change in incompatible ways going forward.
> Scalable transforms can be written through SplittableDoFns or via Cross
> Language transforms.
>
> The SDK isn't 100% feature complete, but keeping it experimental doesn't
> help with that any further.
> Communities grow through contributions and use, and experimental markers
> dissuade users.
> There's plenty to do in order expand what can be done with the SDK.
> (Contributions welcome)
>
> *Why Exit Experimental now?*
>
> Typically when we call an SDK or API Experimental, it's because there's a
> risk that API or behaviors may change significantly.
> This in turn, leads to additional work for users of the SDK on every
> release which leads to sticking to older versions or forking
> to preserve behavior. Version updates should be looked forward to, and
> viewed as having little risk. Further while there's been
> previous dicussion about what the "low bar" is for a new SDK, it hasn't
> been summarily applied to the Go SDK. I feel this has
> hurt development and contribution of new SDK languages (inherent
> difficulty of SDK development notwithstanding).
>
> When the SDK was designed, it wasn't entirely clear what the Beam Model
> should look like in an opinionated language like Go.
> Their initial take (see https://s.apache.org/beam-go-sdk-design-rfc [0])
> goes into detail what it means for a language without
> Generics, or overloading, or inheritance to implement the beam model. One
> could largely throw away static types (like Python),
> but this approach rings hollow for Go. It would not do if the approach
> couldn't grow and scale to the Beam Model. It's also hard
> to tell if an API is any good before there are users.
>
> Further, in the early days of Portability, there wasn't a way to write
> scalable DoFns, dynamically or otherwise. It's an incredible
> bottleneck to need to do all initial fanout of work on a single machine,
> write everything to a Reshuffle, just in order to scale up.
> Without being able to scale, Beam is little more than overhead.
>
> At this point, both of these needs are met within the Go SDK for open
> source.
>
> *Background*
>
> The Go SDK has been a part of the beam repo for a few years now, since it
> was accidentally merged into master.
> Since then it's been called experimental, and not officially part of the
> releases.
>
> Of the SDKs, it's was always designed around Beam Portability first. It
> never had any "Legacy" (SDK x Runner specific ) workers.
> It's always used the Beam Pipeline protos and FnAPI to execute jobs, first
> with some very experimental code on Dataflow, but now
> on all portable supported runners, like Flink, Spark, the Python Portable
> runner, and Dataflow.
>
> *API Stability*
>
> The Go SDK hasn't meaningfully changed it's user API for DoFn and pipeline
> construction since it was first merged in, and there are no
> changes to that on the horizon that can't be made in a backwards
> compatible manner. Largely these are related to New Features, or
> usability improvements enabled by the advent of Go Generics (think of
> "real" KV, emitter, and iterator types).
>
> It's an open secret that the Go SDK has largely been under work for use
> within Google. It's use is called FlumeGo, representing
> the Apache Beam Go SDK, running on top of Flume, Google's batch pipeline
> processing engine. Thus most of the focus on improving
> batch execution. FlumeGo sees ample use today, and there hasn't been a
> call for fundamental changes to the API for ergonomic or
> usability concerns.
>
> *Scalability*
>
> Google could get away without the Go SDK having an SDK side scalability
> solution as a result of it's integration with Flume.
> However, those days are now past.
>
> The Go SDK now supports SplittableDoFns along with Dynamic Splitting,
> which supports writing scalable batch transforms natively
> in the Go SDK.
> The SDK also supports Cross Language Transforms, with Beam Schema
> encodings. With it, production hardened transforms
> from Java and Python are a wrapper away.
>
> Presently, Daniel Oliveira (who implemented the SDF side work, and
> completed the Xlang work,) is adding a wrapper for the
> Java Kafka IO using Cross Language Transforms, which is often been
> requested. This will also enable use of the Beam SQL
> transforms that java enables.
>
> *Features*
>
> The Go SDK implements the Beam C=core. The Go SDK implements standard
> coders, allows for user DoFns, and CombineFns and access
> to core transforms like Flatten, GroupByKey, and features like Side
> Inputs, Windowing, and User Metrics.
> Basic windowing will be fully supported for batch even through lifted
> combines in the 2.32.0 release.
>
> All of the above enables Beam Go to be versatile for batch execution on
> portable runners, and for simple streaming pipelines.
>
> *Repo Testing*
>
> On precommit the Go SDK runs all it's unit tests. On top of that, it runs
> all it's integration tests against the Python Portable runner,
> making it quick and robust to detect breaking changes without overspending
> community resources. Those same tests are also
> run against Dataflow, Flink, and Spark.
>
> The tests are executable against all runners via the appropriate Go
> commands (if you've stood up your own job management server),
> or Gradle commands (which will spin up runner instances for you).
> Documentation for executing tests and adding new ones
> is on the wiki. [2] They are accessible to Go developers as they're
> implemented with the standard Go testing tools.
>
> *Shortcomings*
> That said, there's still much to do. Let me briefly tell you what doesn't
> work, and it's up to you to weigh whether they block
> being out of experimental.
>
> At present, only a textio has been implemented as Splittable DoFn.
> Once the Kafka wrapper is merged in, it will serve as a the first example
> for future contributions for
> new transform wrappers for the Go SDK.
> Transforms and IOs are lacking, but at this point users are empowered to
> write their own DoFns or wrap existing transforms for Cross Language use.
>
> In the core SDK, more streaming focused features have yet to be
> implemented, but they're largely additions to what exists already
> rather than total rebuilds. Much of the work is definining how a user
> specifies their desires, and turning those into the appropriate
> FnAPI requests at execution time. Back in October I wrote at length on the
> wiki [1] what's missing for additional streaming features.
>
> While we have bolstered our testing recently, there's likely still more we
> could test to improve our confidence in the SDK,
> in particular regarding the included transforms libraries and examples.
>
> *Moving Forward*
>
> My immediate plan is to work on incorporating the Go SDK fully into the
> Beam Programming Guide. I've audited the guide [3], and
> am beginning to add missing content and filling in the Go specific gaps.
> This will be tied to improving the Go Doc with more Go
> specific user documentation that isn't appropriate for the BPG.
> And resolving the LICENSE issue around the public display of that GoDoc.
>
> If this proposal is accepted by a binding vote, I will incorporate the SDK
> into the release process, and remove the "experimental"
> language around the SDK. This largely entails updating the release scripts
> to also build and publish the Go SDK Docker containers.
> As for releasing the code, we're technically already doing so whenever we
> tag a release branch [4].
>
> The clearest signal to the Go community however will be migrating the SDK
> to use Go Modules for dependency version control,
> which Daniel is planning on working on after his Kafka task. This will put
> our repo infrastructure, SDK contributors, and users
> on the same footing when it comes to dependency management. It will remove
> the "+incompatible" tags one sees on the
> pkg.go.dev list at [4].
>
> I'm very happy to answer any questions you might have about the SDK, and
> provide additional links as needed. I intentionally avoided
> a link barrage in this email, as they can distract from the point: The SDK
> is ready for folks to use it, we need to tell them that they can
> rather than they shouldn't.
>
> Robert Burke
> Defacto Beam Go TL
>
> [0] https://s.apache.org/beam-go-sdk-design-rfc
> [1]
> https://cwiki.apache.org/confluence/display/BEAM/Supporting+Streaming+in+the+Go+SDK
> [2] https://cwiki.apache.org/confluence/display/BEAM/Go+Tips
> [3]
> https://docs.google.com/spreadsheets/d/1DrBFjxPBmMMmPfeFr6jr_JndxGOes8qDqKZ2Uxwvvds/edit?resourcekey=0-tVFwcLrQ2v2jpZkHk6QOpQ#gid=2072310090
> (SDK Audit sheet)
> [4]
> https://pkg.go.dev/github.com/apache/beam/sdks/go/pkg/beam?tab=versions
>