You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/17 17:52:39 UTC

[GitHub] [beam] damccorm opened a new pull request, #22761: Go stateful DoFns user side changes

damccorm opened a new pull request, #22761:
URL: https://github.com/apache/beam/pull/22761

   Adds all the user facing plumbing/validation we need to make stateful DoFns work in Go. Follows the design laid out in https://docs.google.com/document/d/1rcKa1Z6orDDFr1l8t6NA1eLl6zanQbYAEiAqk39NQUU/edit?usp=sharing
   
   I have the full version of this (not cleaned up and not quite 100% working on the talking to the state API side) in this PR - https://github.com/damccorm/beam/pull/70/files#diff-b2583776e71d93a9777a2ababf6ae018ff116c53c905486689aced60fdf44a4e - and was able to verify that it allows us to write a pipeline with a value state object and read to/write from that state. More work is needed to get the execution piece 100%.
   
   This is the first part of #22736 which also has next steps laid out (see my first comment in that issue).
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22761:
URL: https://github.com/apache/beam/pull/22761#discussion_r948414002


##########
sdks/go/pkg/beam/core/state/state.go:
##########
@@ -0,0 +1,117 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package state contains structs for reading and manipulating pipeline state.
+package state
+
+import (
+	"reflect"
+)
+
+type TransactionType_Enum int32
+
+const (
+	TransactionType_Set   TransactionType_Enum = 0
+	TransactionType_Clear TransactionType_Enum = 1
+)
+
+var (
+	ProviderType = reflect.TypeOf((*Provider)(nil)).Elem()
+)
+
+// TODO(#20510) - add other forms of state (MapState, BagState, CombiningState), prefetch, and clear.
+
+// Transaction is used to represent a pending state transaction. This should not be manipulated directly;
+// it is primarily used for implementations of the Provider interface to talk to the various State objects.
+type Transaction struct {
+	Key  string
+	Type TransactionType_Enum
+	Val  interface{}
+}
+
+// Provider represents the DoFn parameter used to get and manipulate pipeline state
+// stored as key value pairs (https://beam.apache.org/documentation/programming-guide/#state-and-timers).
+// This should not be manipulated directly. Instead it should be used as a parameter
+// to functions on State objects like state.Value.
+type Provider interface {
+	ReadValueState(id string) (interface{}, []Transaction, error)
+	WriteValueState(val Transaction) error
+}
+
+type PipelineState interface {

Review Comment:
   Good call, I added one.
   
   I may end up moving this to a different package when all is said and done (most likely when I have more execution focused state stuff) since I don't really want users implementing it. It will also eventually probably need some metadata about what state type should be used (ValueState in this case). Basically, its subject to change in my next PR(s) since it's more of an internals concept.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22761:
URL: https://github.com/apache/beam/pull/22761#issuecomment-1218366452

   Run Go PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] riteshghorse commented on a diff in pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
riteshghorse commented on code in PR #22761:
URL: https://github.com/apache/beam/pull/22761#discussion_r948390126


##########
sdks/go/pkg/beam/core/funcx/fn.go:
##########
@@ -374,6 +390,8 @@ func New(fn reflectx.Func) (*Fn, error) {
 			kind = FnWindow
 		case t == typex.BundleFinalizationType:
 			kind = FnBundleFinalization
+		case t == state.ProviderType:

Review Comment:
   Should we have the types for state and timers defined in typex package? I was thinking this for timers as well.



##########
sdks/go/pkg/beam/core/state/state.go:
##########
@@ -0,0 +1,117 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+// Package state contains structs for reading and manipulating pipeline state.
+package state
+
+import (
+	"reflect"
+)
+
+type TransactionType_Enum int32
+
+const (
+	TransactionType_Set   TransactionType_Enum = 0
+	TransactionType_Clear TransactionType_Enum = 1
+)
+
+var (
+	ProviderType = reflect.TypeOf((*Provider)(nil)).Elem()
+)
+
+// TODO(#20510) - add other forms of state (MapState, BagState, CombiningState), prefetch, and clear.
+
+// Transaction is used to represent a pending state transaction. This should not be manipulated directly;
+// it is primarily used for implementations of the Provider interface to talk to the various State objects.
+type Transaction struct {
+	Key  string
+	Type TransactionType_Enum
+	Val  interface{}
+}
+
+// Provider represents the DoFn parameter used to get and manipulate pipeline state
+// stored as key value pairs (https://beam.apache.org/documentation/programming-guide/#state-and-timers).
+// This should not be manipulated directly. Instead it should be used as a parameter
+// to functions on State objects like state.Value.
+type Provider interface {
+	ReadValueState(id string) (interface{}, []Transaction, error)
+	WriteValueState(val Transaction) error
+}
+
+type PipelineState interface {

Review Comment:
   Doc comment. Also fine if you are planning to add it in next PRs.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #22761:
URL: https://github.com/apache/beam/pull/22761#discussion_r948406719


##########
sdks/go/pkg/beam/core/funcx/fn.go:
##########
@@ -374,6 +390,8 @@ func New(fn reflectx.Func) (*Fn, error) {
 			kind = FnWindow
 		case t == typex.BundleFinalizationType:
 			kind = FnBundleFinalization
+		case t == state.ProviderType:

Review Comment:
   I'd probably vote we leave it as is unless there's a reason to switch - the [typex package generally only exports types for structs it defines](https://github.com/apache/beam/blob/48bad7d966a583055669850eb9fb558782f636a8/sdks/go/pkg/beam/core/typex/special.go#L45) and IMO it makes sense to stay it in the state package to keep consistent with that. I don't feel super strongly though (@lostluck might have opinions too)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22761:
URL: https://github.com/apache/beam/pull/22761#issuecomment-1218394979

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @riteshghorse for label go.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm merged pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
damccorm merged PR #22761:
URL: https://github.com/apache/beam/pull/22761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
codecov[bot] commented on PR #22761:
URL: https://github.com/apache/beam/pull/22761#issuecomment-1218339544

   # [Codecov](https://codecov.io/gh/apache/beam/pull/22761?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#22761](https://codecov.io/gh/apache/beam/pull/22761?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7bfe0c2) into [master](https://codecov.io/gh/apache/beam/commit/91c4b87aa95d89aac806ef374fda63637960bd6c?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (91c4b87) will **decrease** coverage by `0.05%`.
   > The diff coverage is `44.27%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #22761      +/-   ##
   ==========================================
   - Coverage   74.20%   74.14%   -0.06%     
   ==========================================
     Files         710      711       +1     
     Lines       93547    93738     +191     
   ==========================================
   + Hits        69415    69501      +86     
   - Misses      22855    22957     +102     
   - Partials     1277     1280       +3     
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | go | `51.49% <44.27%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/22761?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/go/pkg/beam/core/graph/edge.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL2dyYXBoL2VkZ2UuZ28=) | `3.35% <ø> (ø)` | |
   | [sdks/go/pkg/beam/core/runtime/exec/translate.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZXhlYy90cmFuc2xhdGUuZ28=) | `13.94% <0.00%> (-0.34%)` | :arrow_down: |
   | [sdks/go/pkg/beam/core/runtime/graphx/serialize.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZ3JhcGh4L3NlcmlhbGl6ZS5nbw==) | `27.34% <0.00%> (-0.17%)` | :arrow_down: |
   | [sdks/go/pkg/beam/core/runtime/graphx/translate.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZ3JhcGh4L3RyYW5zbGF0ZS5nbw==) | `41.58% <0.00%> (-0.88%)` | :arrow_down: |
   | [sdks/go/pkg/beam/pardo.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9wYXJkby5nbw==) | `44.44% <10.00%> (-2.97%)` | :arrow_down: |
   | [sdks/go/pkg/beam/core/runtime/exec/fn.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3J1bnRpbWUvZXhlYy9mbi5nbw==) | `67.48% <21.42%> (-2.07%)` | :arrow_down: |
   | [sdks/go/pkg/beam/core/funcx/fn.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL2Z1bmN4L2ZuLmdv) | `57.30% <31.25%> (-1.57%)` | :arrow_down: |
   | [sdks/go/pkg/beam/core/graph/fn.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL2dyYXBoL2ZuLmdv) | `84.23% <68.85%> (-1.11%)` | :arrow_down: |
   | [sdks/go/pkg/beam/core/state/state.go](https://codecov.io/gh/apache/beam/pull/22761/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9nby9wa2cvYmVhbS9jb3JlL3N0YXRlL3N0YXRlLmdv) | `78.37% <78.37%> (ø)` | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22761:
URL: https://github.com/apache/beam/pull/22761#issuecomment-1218335794

   Run Go PreCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
damccorm commented on PR #22761:
URL: https://github.com/apache/beam/pull/22761#issuecomment-1218387623

   R: @riteshghorse 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #22761: Go stateful DoFns user side changes

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #22761:
URL: https://github.com/apache/beam/pull/22761#issuecomment-1218403906

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org