You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:03:25 UTC

[GitHub] [beam] damccorm opened a new issue, #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

damccorm opened a new issue, #21080:
URL: https://github.com/apache/beam/issues/21080

   `insertAll` will retry forever on a streaming pipeline running on `2.31.0`, with `insert_retry_strategy=RetryStrategy.RETRY_NEVER`, and `create_disposition=BigQueryDisposition.CREATE_NEVER`.
   
   Found while testing error handling for a pipeline by writing to a table that doesn't exist, ending up with no element in `BigQueryWriteFn.FAILED_ROWS` and these errors repeated in the logs:
   ```
   
   Error message from worker: generic::unknown: Traceback (most recent call last):
     File "apache_beam/runners/common.py",
   line 1257, in apache_beam.runners.common.DoFnRunner._invoke_bundle_method
     File "apache_beam/runners/common.py",
   line 510, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
     File "apache_beam/runners/common.py",
   line 516, in apache_beam.runners.common.DoFnInvoker.invoke_finish_bundle
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
   line 1268, in finish_bundle
       return self._flush_all_batches()
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
   line 1278, in _flush_all_batches
       for destination in list(self._rows_buffer.keys())
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
   line 1279, in <listcomp>
       if self._rows_buffer[destination]
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery.py",
   line 1312, in _flush_batch
       skip_invalid_rows=True)
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
   line 1125, in insert_rows
       project_id, dataset_id, table_id, final_rows, skip_invalid_rows)
     File
   "/usr/local/lib/python3.7/site-packages/apache_beam/utils/retry.py", line 253, in wrapper
       return
   fun(*args, **kwargs)
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/bigquery_tools.py",
   line 637, in _insert_all_rows
       response = self.client.tabledata.InsertAll(request)
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/internal/clients/bigquery/bigquery_v2_client.py",
   line 795, in InsertAll
       config, request, global_params=global_params)
     File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
   line 731, in _RunMethod
       return self.ProcessHttpResponse(method_config, http_response, request)
   
    File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py", line 737, in ProcessHttpResponse
   
      self.__ProcessHttpResponse(method_config, http_response, request))
     File "/usr/local/lib/python3.7/site-packages/apitools/base/py/base_api.py",
   line 604, in __ProcessHttpResponse
       http_response, method_config=method_config, request=request)
   apitools.base.py.exceptions.HttpNotFoundError:
   HttpError accessing <https://bigquery.googleapis.com/bigquery/v2/projects/<REDACTED>/datasets/testdb__dbo__raw/tables/customers/insertAll?alt=json>:
   response: <{'vary': 'Origin, X-Origin, Referer', 'content-type': 'application/json; charset=UTF-8',
   'date': 'Sat, 21 Aug 2021 10:00:13 GMT', 'server': 'ESF', 'cache-control': 'private', 'x-xss-protection':
   '0', 'x-frame-options': 'SAMEORIGIN', 'transfer-encoding': 'chunked', 'status': '404', 'content-length':
   '344', '-content-encoding': 'gzip'}>, content <{
     "error": {
       "code": 404,
       "message": "Not
   found: Table <REDACTED>:testdb__dbo__raw.customers",
       "errors": [
         {
           "message": "Not
   found: Table <REDACTED>:testdb__dbo__raw.customers",
           "domain": "global",
           "reason":
   "notFound"
         }
       ],
       "status": "NOT_FOUND"
     }
   }
   ...
   
   ```
   
   Possibly related to BEAM-12362. Had been running on `2.29.0` previously, which would send errors repeatedly with no trace:
   ```
   
   There were errors inserting to BigQuery. Will not retry. Errors were []
   
   ```
   
   `2.31.0` is logging the errors but ignores retry strategy, preventing errors from being handled through `FailedRows` tag.
   
   Imported from Jira [BEAM-12783](https://issues.apache.org/jira/browse/BEAM-12783). Original Jira may contain additional context.
   Reported by: ajdub980a.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors
URL: https://github.com/apache/beam/issues/21080


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ajdub508 commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "ajdub508 (via GitHub)" <gi...@apache.org>.
ajdub508 commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1687921266

   Submitted [PR 28091](https://github.com/apache/beam/pull/28091), with comment [here](https://github.com/apache/beam/pull/28091#issue-1861152554) containing description of the fix and request for feedback.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ajdub508 commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "ajdub508 (via GitHub)" <gi...@apache.org>.
ajdub508 commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1687906917

   .take-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ajdub508 commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "ajdub508 (via GitHub)" <gi...@apache.org>.
ajdub508 commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1675911299

   I am having trouble finding the higher level function that will handle [this re-raised exception](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L736). Would anyone be able to point that out for me? I haven't found where that exception would lead to retry strategy evaluation.
   
   Doing some research on this one and finding that the `self.gcp_bq_client.insert_rows_json` call [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L720-L731) won't return a value in `errors` for errors such as a `google.api_core.exceptions.NotFound` error that is thrown when a table doesn't exist.
   
   Those errors will raise a `GoogleAPICallError`, though, and they are caught and re-raised in the  `except` [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L736), with the expectation that it will be retried appropriately. I haven't tracked down where it will go from there, yet.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] damccorm commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "damccorm (via GitHub)" <gi...@apache.org>.
damccorm commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1412166710

   Hey @nervoussidd you can find our contribution guide here - https://beam.apache.org/contribute/
   
   For future reference, you can self assign issues by commenting `.take-issue` and a bot will assign it to you


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] nervoussidd commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "nervoussidd (via GitHub)" <gi...@apache.org>.
nervoussidd commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1411558447

   Hey Danny,
   Can you please assign me this issue, as i would love to contribute in any manner.And one more request can please guide me while i am contributing.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1727299542

   @ajdub508 you can close this issue by commenting ".close-issue" and a bot will close it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ajdub508 commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "ajdub508 (via GitHub)" <gi...@apache.org>.
ajdub508 commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1727594255

   .close-issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ajdub508 commented on issue #21080: WriteToBigQuery ignores insert_retry_strategy on HttpErrors

Posted by "ajdub508 (via GitHub)" <gi...@apache.org>.
ajdub508 commented on issue #21080:
URL: https://github.com/apache/beam/issues/21080#issuecomment-1727293448

   This issue has been resolved with [MR 28091](https://github.com/apache/beam/pull/28091).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org