You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 23:53:13 UTC

[GitHub] [beam] kennknowles opened a new issue, #19404: Improve SpannerIO output

kennknowles opened a new issue, #19404:
URL: https://github.com/apache/beam/issues/19404

   from a discussion in [https://github.com/apache/beam/pull/8097](https://github.com/apache/beam/pull/8097)
    SpannerIO produces 2 output PCollections:
    * getOutput() -\> PCollection<Void\>
    ** never has any values
    ** in GlobalWindow
    ** Closed when the input PCollection is closed (ie never in streaming) to indicate when all input has been written
    ** Used in batch pipelines to have 'dependant' bulk imports - where one dataset is not written to Spanner until another has completed writing. (necessary for handling parent/child (1-many) referential integrity)
    * getFailedMutations() -\> PCollection<MutationGroup\>
    ** only contains values when Mutation[Group]s fail to be written
    ** in GlobalWindow
    ** Not very useful, as the reason for the failure is not given. 
   
   Suggestion: 
    * Deprecate these existing outputs.
    * Make primary output be a PCollection<\{ MutationGroup, CommitTimestamp }\> so that the successfully written Mutation[Groups] can be processed further if necessary.
   (\{a,b} signifies a container class for these values)
    * Add an additional output of failed mutations PCollection<\{ MutationGroup, FailureMessage}\>
    * The existing outputs can be derived from these new outputs
   
   This allows useful error reporting/handling from the failure message, and the ability to continue processing the successful mutations. 
   
    
   
   (see also BEAM-6887)
   
    
   
    
   
   Imported from Jira [BEAM-6921](https://issues.apache.org/jira/browse/BEAM-6921). Original Jira may contain additional context.
   Reported by: nielm.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org