You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/09/14 05:45:36 UTC

[GitHub] [beam] reuvenlax commented on issue #21713: 404s in BigQueryIO don't get output to Failed Inserts PCollection

reuvenlax commented on issue #21713:
URL: https://github.com/apache/beam/issues/21713#issuecomment-1246266615

   I suspect it does happen. Some context:
   This feature was intended for known persistent failures at the row level, and is implemented using the per-row status returned by BigQuery. An example would be a row that does not match the BigQuery schema or a row that exceeds BigQuery's size limit. This feature does not capture potentially ephemeral failures, such as when the RPC to BigQuery itself fails. In that case we simply retry the RPC.
   
   While it seems that the RPC returning a 404 is a persistent error, this generally isn't the case. For instance a temporary outage might cause the RPC to return 404, yet on retry the RPC succeeds. We didn't want a situation in which a temporary BigQuery outage caused all data to be sent to the dead-letter output for a period of time.
   
   However I would like to understand the use case more. Is this a case in which records destined for a specific table are sent to the Dataflow pipeline before the BigQuery table is created? Is there some offline process creating those tables, and is that process simply delayed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org