You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Beam JIRA Bot (Jira)" <ji...@apache.org> on 2021/10/20 17:25:01 UTC

[jira] [Updated] (BEAM-12783) WriteToBigQuery ignores insert_retry_strategy on HttpErrors

     [ https://issues.apache.org/jira/browse/BEAM-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Beam JIRA Bot updated BEAM-12783:
---------------------------------
    Labels: stale-P2  (was: )

> WriteToBigQuery ignores insert_retry_strategy on HttpErrors
> -----------------------------------------------------------
>
>                 Key: BEAM-12783
>                 URL: https://issues.apache.org/jira/browse/BEAM-12783
>             Project: Beam
>          Issue Type: Bug
>          Components: io-py-gcp
>    Affects Versions: 2.31.0
>            Reporter: Adam Whitmore
>            Priority: P2
>              Labels: stale-P2
>
> {{insertAll}} will retry forever on a streaming pipeline running on {{2.31.0}}, with {{insert_retry_strategy=RetryStrategy.RETRY_NEVER}}, and {{create_disposition=BigQueryDisposition.CREATE_NEVER}}.
> Found while testing error handling for a pipeline by writing to a table that doesn't exist, ending up with no element in {{BigQueryWriteFn.FAILED_ROWS}} and these errors repeated in the logs:
> {code:java}
> Error message from worker: generic::unknown: Traceback (most recent call last):
>   File "apache_beam/runners/common.py", line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
>   File "apache_beam/runners/common.py", line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
>   File "apache_beam/runners/common.py", line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1268, in finish_bundle
>     return self._flush_all_batches()
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1278, in _flush_all_batches
>     for destination in list(self._rows_buffer.keys())
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1279, in <listcomp>
>     if self._rows_buffer[destination]
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py", line 1312, in _flush_batch
>     skip_invalid_rows=True)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 1125, in insert_rows
>     project_id, dataset_id, table_id, final_rows, skip_invalid_rows)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
>     return fun(*args, **kwargs)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 637, in _insert_all_rows
>     response = self.client.tabledata.InsertAll(request)
>   File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py", line 795, in InsertAll
>     config, request, global_params=global_params)
>   File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 731, in _RunMethod
>     return self.ProcessHttpResponse(method_config, http_response, request)
>   File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
>     self.__ProcessHttpResponse(method_config, http_response, request))
>   File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 604, in __ProcessHttpResponse
>     http_response, method_config=method_config, request=request)
> apitools.base.py.exceptions.HttpNotFoundError: HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>: response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8', 'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection': '0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length': '344', '-content-encoding': 'gzip'}>, content <{
>   "error": {
>     "code": 404,
>     "message": "Not found: Table <REDACTED>:testdb__dbo__raw.customers",
>     "errors": [
>       {
>         "message": "Not found: Table <REDACTED>:testdb__dbo__raw.customers",
>         "domain": "global",
>         "reason": "notFound"
>       }
>     ],
>     "status": "NOT_FOUND"
>   }
> }
> ...
> {code}
> Possibly related to BEAM-12362. Had been running on {{2.29.0}} previously, which would send errors repeatedly with no trace:
> {code:java}
> There were errors inserting to BigQuery. Will not retry. Errors were []
> {code}
> {{2.31.0}} is logging the errors but ignores retry strategy, preventing errors from being handled through {{FailedRows}} tag.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)