You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 18:03:12 UTC

[GitHub] [beam] damccorm opened a new issue, #20514: Return error message/information with existing FAILED_ROW data from BigQueryWriteFn (Python SDK)

damccorm opened a new issue, #20514:
URL: https://github.com/apache/beam/issues/20514

   A user may call `apache_beam.io.gcp.bigquery.WriteToBigQuery` to write their streamed data to BQ. If any rows fail to write, this will return a tagged pcollection `BigQueryWriteFn.FAILED_ROWS`. This data includes a tuple `(destination_table, failed_row_payload)`.
   
   My suggestion is to include the error information in the `FAILED_ROWS` pcollection. From the source code we can see that we have access to the error information, e.g. that the row failed because field `id` was `invalid` because `this field is not a record`. I think we should surface this to the user.
   
   I'm happy to open a PR for this myself (as I've already had to overwrite the original code in several projects), but it looks like we'd need a breaking change by either extending the tuple which would cause unpacking issues in existing code, or by returning a different data structure entirely.
   
    
   
   Relevant owners:
   
   [~altay] 
    [~charleschen70@yahoo.com]
   
   Imported from Jira [BEAM-10233](https://issues.apache.org/jira/browse/BEAM-10233). Original Jira may contain additional context.
   Reported by: tomhardman0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org