You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2022/03/01 21:47:00 UTC

[jira] [Work logged] (BEAM-10976) Enable Bundle Finalization in Go SDK

     [ https://issues.apache.org/jira/browse/BEAM-10976?focusedWorklogId=734968&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-734968 ]

ASF GitHub Bot logged work on BEAM-10976:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Mar/22 21:46
            Start Date: 01/Mar/22 21:46
    Worklog Time Spent: 10m 
      Work Description: damccorm opened a new pull request #16980:
URL: https://github.com/apache/beam/pull/16980


   This is part 1 of 2 to add bundle finalization support to the Go Sdk.
   
   **Summary of Overall Changes**
   
   Bundle finalization enables a DoFn to perform side effects after a runner has acknowledged that it has durably persisted the output. Right now, Java and Python support bundle finalization by allowing a user to register a callback function which is invoked when the runner acknowledges that it has persisted all output, but Go does not have any such support. This is part of a larger change to add support to the Go Sdk as outlined in this [design doc](https://docs.google.com/document/d/1dLylt36oFhsWfyBaqPayYXqYHCICNrSZ6jmr51eqZ4k/edit#).
   
   I've completed all the changes (sans some better testing on the parts not in this PR), you can see the remaining files in the diff here - https://github.com/apache/beam/compare/master...damccorm:users/damccorm/bundle-finalization?expand=1
   
   **Summary of Changes in this PR**
   
   This PR adds most of the non user facing changes needed to enable this change. There are basically 3 major components:
   
   1. Changes to exec/pardo.go and exec/fn.go to add the bundleFinalizer type, pass it into the user's DoFn when appropriate, and invoke the callbacks on finalization.
   1. Changes to harness.go and plan.go to manage plans that require finalization and respond to the runner sending the bundle finalization message.
   1. Adding FinalizeBundle and GetBundleExpirationTime to the Unit interface and all structs implementing it. This allows the FinalizeBundle command to propogate through the execution graph and allows us to get the time we can expire a bundle everywhere in the graph respectively. This cascaded into a bunch of small changes that are responsible for most of the file changes in this PR
   
   I would recommend reviewing the PR files in the order I just mentioned.
   
   **Additional testing done**
   
   On top of the units added, using my full implementation (not just this partial one) I also was able to run an E2E example on Dataflow (FWIW, not all runners have finalization support but I found that Dataflow does). In that example, I hijacked the wordcount example and added a bundleFinalizer to write a file to persistent storage for each line that had at least 3 words (chosen pretty randomly to minimize the chances of collisions). I'll omit the whole sample since its long, but it produced a bunch of files like this:
   
   <img width="259" alt="image" src="https://user-images.githubusercontent.com/42773683/156254068-adab3a92-7978-4284-9963-8b80c8ecbf59.png">
   
   This indeed ran after the other data was persisted
   
   **Next Steps**
   
   After this, I'll add the user facing functionality in a follow up pr, along with testing for that and 1+ integration test for the whole flow.
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [x] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 734968)
    Remaining Estimate: 0h
            Time Spent: 10m

> Enable Bundle Finalization in Go SDK
> ------------------------------------
>
>                 Key: BEAM-10976
>                 URL: https://issues.apache.org/jira/browse/BEAM-10976
>             Project: Beam
>          Issue Type: Sub-task
>          Components: sdk-go
>            Reporter: Robert Burke
>            Assignee: Danny McCormick
>            Priority: P3
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Eg. to support acking pubsub/kafka messages as processed after the results have been properly committed by the runner.
> Note, that due to BEAM-10959 that when implementing this, an instruction must remain "active" until it's finalization occurs as well. Specifically, we should probably keep another map around for "to be finalized" process bundle instructions so we can return the appropriate "empty" response and not accidently evict them from the nearly equivalent inactive state until after finalization.
> [https://s.apache.org/beam-finalizing-bundles]
>  
> (To be updated once [https://github.com/apache/beam/pull/13160] is merged and the programming guide updated with SDF content.)
> See also Java and Python approaches
> https://beam.apache.org/documentation/programming-guide/#bundle-finalization



--
This message was sent by Atlassian Jira
(v8.20.1#820001)