You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 19:02:38 UTC

[GitHub] [beam] damccorm opened a new issue, #20704: BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows

damccorm opened a new issue, #20704:
URL: https://github.com/apache/beam/issues/20704

   Just as org.apache.beam.sdk.io.gcp.bigquery.WriteResult.getFailedInserts() allows a user to collect failed writes for downstream processing (e.g., sinking the records into some kind of deadletter store), could the results of a BigQueryIO.read(SerializableFunction) be collected, allowing a user to access TableRows that were not able to be parsed by the provided function , for the purpose of downstream processing (e.g., some kind of deadletter handling). 
   
   In our use case, all data loaded into our Apache Beam pipeline must meet a specified schema, where certain fields are required to be non-null. It would be ideal to collect records that do not meet the schema to output them to some kind of deadletters store.
   
   Our current implementation requires us to use the slower BigQueryIO.ReadTableRows() and then attempt, in a subsequent transform, to parse these TableRows into a custom typed object, outputting any failures to a side output for downstream processing. This isn't incredibly cumbersome, but it would be a nice feature of the connector itself.
   
   Imported from Jira [BEAM-11919](https://issues.apache.org/jira/browse/BEAM-11919). Original Jira may contain additional context.
   Reported by: jacquelynwax.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #20704: BigQueryIO.read(SerializableFunction): Collect records that could not be parsed into the custom-typed object into a PCollection of TableRows

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #20704:
URL: https://github.com/apache/beam/issues/20704#issuecomment-1334615438

   @johnjcasey I'm triaging issues - is this still relevant?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org