You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@beam.apache.org by "Jacquelyn Wax (Jira)" <ji...@apache.org> on 2021/03/04 01:31:00 UTC

[jira] [Updated] (BEAM-11919) BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows

     [ https://issues.apache.org/jira/browse/BEAM-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jacquelyn Wax updated BEAM-11919:
---------------------------------
    Summary: BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows  (was: BigQueryIO.read(SerializableFunction): Collect records that could not be successfully parsed into the user-provided custom-typed object into a PCollection of TableRows)

> BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows
> --------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: BEAM-11919
>                 URL: https://issues.apache.org/jira/browse/BEAM-11919
>             Project: Beam
>          Issue Type: Wish
>          Components: io-java-gcp
>            Reporter: Jacquelyn Wax
>            Priority: P3
>
> Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() allows a user to collect failed writes for downstream processing (e.g., sinking the records into some kind of deadletter store), could the results of a BigQueryIO.read(SerializableFunction) be collected, allowing a user to access TableRows that were not able to be parsed by the provided function , for the purpose of downstream processing (e.g., some kind of deadletter handling). 
> In our use case, all data loaded into our Apache Beam pipeline must meet a specified schema, where certain fields are required to be non-null. It would be ideal to collect records that do not meet the schema to output them to some kind of deadletters store.
> Our current implementation requires us to use the slower BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to parse these TableRows into a custom typed object, outputting any failures to a side output for downstream processing. This isn't incredibly cumbersome, but it would be a nice feature of the connector itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)