You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Jacquelyn Wax (Jira)" <ji...@apache.org> on 2021/03/04 01:31:00 UTC
[jira] [Updated] (BEAM-11919)
BigQueryIO.read(SerializableFunction): Collect records that could not be
parsed into the custom-typed object into a PCollection of TableRows
[ https://issues.apache.org/jira/browse/BEAM-11919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jacquelyn Wax updated BEAM-11919:
---------------------------------
Summary: BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows (was: BigQueryIO.read(SerializableFunction): Collect records that could not be successfully parsed into the user-provided custom-typed object into a PCollection of TableRows)
> BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows
> --------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: BEAM-11919
> URL: https://issues.apache.org/jira/browse/BEAM-11919
> Project: Beam
> Issue Type: Wish
> Components: io-java-gcp
> Reporter: Jacquelyn Wax
> Priority: P3
>
> Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() allows a user to collect failed writes for downstream processing (e.g., sinking the records into some kind of deadletter store), could the results of a BigQueryIO.read(SerializableFunction) be collected, allowing a user to access TableRows that were not able to be parsed by the provided function , for the purpose of downstream processing (e.g., some kind of deadletter handling).
> In our use case, all data loaded into our Apache Beam pipeline must meet a specified schema, where certain fields are required to be non-null. It would be ideal to collect records that do not meet the schema to output them to some kind of deadletters store.
> Our current implementation requires us to use the slower BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to parse these TableRows into a custom typed object, outputting any failures to a side output for downstream processing. This isn't incredibly cumbersome, but it would be a nice feature of the connector itself.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)