You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Christian Battista <ch...@benchsci.com> on 2020/09/08 20:28:05 UTC

Problems with BigQuery Sinks - Error encountered during execution. Retrying may solve the problem

Hi all,

We've been using Cloud Dataflow (currently on Apache Beam Python 3.7 SDK
2.22.0) since about January and are running into a problem recently with BQ
Sinks.

Essentially what seems to be happening is our (batch) dataflow pipelines
fail because of a failure in one of the load jobs that brings data from GCS
into temp tables in BQ
[BigQuery/BigQueryBatchFileLoads/WaitForTempTableLoadJobs], and it fails
with 'Error encountered during execution. Retrying may solve the problem'.
If we find the offending load job and manually re-try it (as the error
message suggests), it does work.

This is a transient error for us that doesn't occur every pipeline run but
seems to be happening more in recent weeks.  Unfortunately, there does not
seem to be any retry behaviour available here that we can use in the
context of our pipeline - and we are pretty reliant on BQ for downstream
processes so switching to a new sink is not a happy option for us.  I am
wondering if any other Dataflow/BigQuery users have been encountering this
lately and if so whether any workarounds have been attempted.

Thanks,
-C

-- 
Christian Battista
Senior Data Engineer, BenchSci
www.benchsci.com