You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 15:59:44 UTC

[GitHub] [beam] jrmccluskey opened a new pull request, #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

jrmccluskey opened a new pull request, #17956:
URL: https://github.com/apache/beam/pull/17956

   Adds small code snippet example to the Beam Programming Guide that demonstrates self-checkpointing behavior in Beam Go.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [ ] Add a link to the appropriate issue in your description, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
jrmccluskey commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146139843

   R: @riteshghorse @damccorm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146103311

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] asf-ci commented on pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
asf-ci commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146103317

   Can one of the admins verify this patch?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889195081


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   Adding an error parameter is reasonable, so that's added. It did lead to some extra error checking overhead since we don't have the try-catch mechanism the other SDKs leverage but that's not a huge problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146286647

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889194510


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {
+  position := rt.GetRestriction().(offsetrange.Restriction).Start
+  for {
+    records, err := fn.ExternalService.readNextRecords(position)
+    if err == fn.ExternalService.ThrottlingErr {
+      return sdf.ResumeProcessingIn(60 * time.Seconds)
+    }
+    if len(records) == 0 {
+      return sdf.ResumeProcessingIn(10 * time.Seconds)

Review Comment:
   That's a fair note. Adding clarifying comments is always good for a documentation snippet



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889190683


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   This is a completely fictional IO since we don't actually have a robust native streaming IO, so there's nothing to compile. It's just modeled after the Python and Java versions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck merged pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
lostluck merged PR #17956:
URL: https://github.com/apache/beam/pull/17956


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146254738

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @riteshghorse for label go.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889155710


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   Can we put this in the snippets folder (example below in Watermark estimation section)? I know we haven't been clean on that before, but it:
   
   (a) makes sure that the code actually compiles
   (b) makes it easier to reuse (e.g. I know Dataflow has docs that use snippets from Beam)



##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {
+  position := rt.GetRestriction().(offsetrange.Restriction).Start
+  for {
+    records, err := fn.ExternalService.readNextRecords(position)
+    if err == fn.ExternalService.ThrottlingErr {
+      return sdf.ResumeProcessingIn(60 * time.Seconds)
+    }
+    if len(records) == 0 {
+      return sdf.ResumeProcessingIn(10 * time.Seconds)

Review Comment:
   Maybe add a comment along the lines of `// Wait for data to be available`? Might be nice to have a similar comment for the throttling case and the finish execution case as well.



##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   Could you return an `err` parameter as well (it can just return nil)? Something I realized w/ Bundle Finalization is that its much more helpful if we provide the parameters that surround the one we are demonstrating because it allows users to see the ordering we require.
   
   Side note unrelated to this PR: We probably need better ordering error messages, they are pretty confusing right now.



##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   I'm actually also curious about what process continuation we should return when we return an err response actually - is it nil? Might be worth including that as an option if for example, `records, err := fn.ExternalService.readNextRecords(position)` returns a non-nil, non-throttling error respone



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889196045


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   Yeah - I don't care if we make up empty functions for that, we already do that for a number of the existing snippets. The process continuation stuff should compile though, and that's the important bit anyways.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
damccorm commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889196045


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   Yeah - I don't care if we make up empty functions for that, we already do that for a number of the existing snippets. The process continuation stuff should compile though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on a diff in pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
jrmccluskey commented on code in PR #17956:
URL: https://github.com/apache/beam/pull/17956#discussion_r889191309


##########
website/www/site/content/en/documentation/programming-guide.md:
##########
@@ -6422,7 +6422,26 @@ resource utilization.
 {{< /highlight >}}
 
 {{< highlight go >}}
-This is not supported yet, see BEAM-11104.
+func (fn *splittableDoFn) ProcessElement(rt *sdf.LockRTracker, emit func(Record)) sdf.ProcessContinuation {

Review Comment:
   IIRC if an SDF signature has a ProcessContinuation return we *always* expect either a Resume() or Stop() continuation and never a nil. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jrmccluskey commented on pull request #17956: [BEAM-11104] Add code snippet for Go SDK Self-Checkpointing

Posted by GitBox <gi...@apache.org>.
jrmccluskey commented on PR #17956:
URL: https://github.com/apache/beam/pull/17956#issuecomment-1146285870

   R: @lostluck 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org