You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Yohei Onishi <vi...@gmail.com> on 2019/08/08 05:55:37 UTC
BigQueryIO - insert retry policy in Apache Beam
Hi,
If you are familiar with BiqQuery insert retry policies in Apache Beam API
(BigQueryIO), please help me understand the following behavior. I am using
Dataflow runner.
- How Dataflow job behave if I specify retryTransientErrors?
- shouldRetry provides an error from BigQuery and I can decide if I
should retry. Where can I find expected error from BigQuery?
*BiqQuery insert retry policies*
https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.html
- alwaysRetry - Always retry all failures.
- neverRetry - Never retry any failures.
- retryTransientErrors - Retry all failures except for known persistent
errors.
- shouldRetry - Return true if this failure should be retried.
*Background*
- When my Cloud Dataflow job inserting very old timestamp (more than 1
year before from now) to BigQuery, I got the following error.
- Retry did not stop so I added retryTransientErrors to BigQueryIO.Write
step then the retry stopped.
jsonPayload: {
> exception: "java.lang.RuntimeException: java.io.IOException: Insert
> failed:
> [{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for
> field
> timestamp_scanned of the destination table
> fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the
> allowed bounds.
> You can only stream to date range within 365 days in the past and 183 days
> in
> the future relative to the current date.","reason":"invalid"}],
> After the first error, Dataflow try to retry insert and it always rejected
> from BigQuery with the same error.
I also posted the same question here
https://stackoverflow.com/questions/57403980/biqquery-insert-retry-policy-in-apache-beam
Yohei Onishi
Re: BigQueryIO - insert retry policy in Apache Beam
Posted by Lukasz Cwik <lc...@google.com>.
On Wed, Aug 7, 2019 at 10:55 PM Yohei Onishi <vi...@gmail.com> wrote:
> Hi,
>
> If you are familiar with BiqQuery insert retry policies in Apache Beam API
> (BigQueryIO), please help me understand the following behavior. I am using
> Dataflow runner.
>
> - How Dataflow job behave if I specify retryTransientErrors?
>
>
All errors are considered transient except if BigQuery says that the error
reason is one of "invalid", "invalidQuery", "notImplemented"
https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java#L44
>
> - shouldRetry provides an error from BigQuery and I can decide if I
> should retry. Where can I find expected error from BigQuery?
>
> You can't since the errors are not visible to the caller:
https://github.com/apache/beam/blob/c2f0d282337f3ae0196a7717712396a5a41fdde1/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.java#L36
I'm not sure if this was done on purpose or whether Apache Beam should
expose the errors so users can write their own retry logic.
> *BiqQuery insert retry policies*
>
> https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/InsertRetryPolicy.html
>
>
> - alwaysRetry - Always retry all failures.
> - neverRetry - Never retry any failures.
> - retryTransientErrors - Retry all failures except for known
> persistent errors.
> - shouldRetry - Return true if this failure should be retried.
>
> *Background*
>
> - When my Cloud Dataflow job inserting very old timestamp (more than 1
> year before from now) to BigQuery, I got the following error.
> - Retry did not stop so I added retryTransientErrors to
> BigQueryIO.Write step then the retry stopped.
>
> jsonPayload: {
>> exception: "java.lang.RuntimeException: java.io.IOException: Insert
>> failed:
>> [{"errors":[{"debugInfo":"","location":"","message":"Value 690000000 for
>> field
>> timestamp_scanned of the destination table
>> fr-prd-datalake:rfid_raw.store_epc_transactions_cr_uqjp is outside the
>> allowed bounds.
>> You can only stream to date range within 365 days in the past and 183
>> days in
>> the future relative to the current date.","reason":"invalid"}],
>> After the first error, Dataflow try to retry insert and it always
>> rejected from BigQuery with the same error.
>
>
> I also posted the same question here
> https://stackoverflow.com/questions/57403980/biqquery-insert-retry-policy-in-apache-beam
>
> Yohei Onishi
>