You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chamikara Madhusanka Jayalath (Jira)" <ji...@apache.org> on 2022/05/05 18:01:00 UTC

[jira] [Updated] (BEAM-14383) Improve "FailedRows" errors returned by beam.io.WriteToBigQuery

     [ https://issues.apache.org/jira/browse/BEAM-14383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chamikara Madhusanka Jayalath updated BEAM-14383:
-------------------------------------------------
    Status: Open  (was: Triage Needed)

> Improve "FailedRows" errors returned by beam.io.WriteToBigQuery
> ---------------------------------------------------------------
>
>                 Key: BEAM-14383
>                 URL: https://issues.apache.org/jira/browse/BEAM-14383
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-py-gcp
>            Reporter: Oskar Firlej
>            Priority: P2
>          Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> `WriteToBigQuery` pipeline returns `errors` when trying to insert rows that do not match the BigQuery table schema. `errors` is a dictionary that cointains one `FailedRows` key. `FailedRows` is a list of tuples where each tuple has two elements: BigQuery table name and the row that didn't match the schema.
> This can be verified by running the `BigQueryIO deadletter pattern` https://beam.apache.org/documentation/patterns/bigqueryio/
> Using this approach I can print the failed rows in a pipeline. When running the job, logger simultaneously prints out the reason why the rows were invalid. The reason should also be included in the tuple in addition to the BigQuery table and the raw row. This way next pipeline could process both the invalid row and the reason why it is invalid.
> During my reasearch i found a couple of alternate solutions, but i think they are more complex than they need to be. Thats why i explored the beam source code and found the solution to be an easy and simple change.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)