You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Svetak Vihaan Sundhar (Jira)" <ji...@apache.org> on 2022/04/25 18:24:00 UTC

[jira] [Created] (BEAM-14364) 404s in BigQueryIO don't get output to Failed Inserts PCollection

Svetak Vihaan Sundhar created BEAM-14364:
--------------------------------------------

             Summary: 404s in BigQueryIO don't get output to Failed Inserts PCollection
                 Key: BEAM-14364
                 URL: https://issues.apache.org/jira/browse/BEAM-14364
             Project: Beam
          Issue Type: Bug
          Components: io-py-gcp
            Reporter: Svetak Vihaan Sundhar


Given that BigQueryIO is configured to use createDisposition(CREATE_NEVER),
and the DynamicDestinations class returns "null" for a schema,
and the table for that destination does not exist in BigQuery, When I stream records to BigQuery for that table, then the write should fail,
and the failed rows should appear on the output PCollection for Failed Inserts (via getFailedInserts().
 
Almost all of the time, the table exists before hand, but given that new tables can be created, we want this behavior to be non-explosive to the Job, however, what we are seeing is that processing completely stops in those pipelines, and eventually the jobs run out of memory. I feel that the appropriate action when BigQuery 404's for the table, would be to submit those failed TableRows to the output PCollection and continue processing as normal.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)