You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Chamikara Madhusanka Jayalath (Jira)" <ji...@apache.org> on 2021/05/19 02:22:00 UTC

[jira] [Commented] (BEAM-12362) BigQuery sink swallows HttpErrors when performing streaming inserts preventing retries

    [ https://issues.apache.org/jira/browse/BEAM-12362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17347262#comment-17347262 ] 

Chamikara Madhusanka Jayalath commented on BEAM-12362:
------------------------------------------------------

Details:

If "self.client.tabledata.InsertAll(request)" raises an HttpError at [1] we swallow the exception at [2] and return "False, []" at [3].

Then, at the caller [4] we set "should_retry" to False since the list of errors is an empty map hence resulting in this error never being retried.

The result is that we will end up adding the corresponding rows to failed_rows at [5] without ever retrying even though the RetryStrategy could have been set to RetryStrategy.RETRY_ALWAYS.

[1] [https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L637|https://www.google.com/url?q=https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py%23L637&sa=D&usg=AFQjCNFKQKo1TaAJ-U4-Cj0pyi1hbSVaVQ]
[2] [https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L644|https://www.google.com/url?q=https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py%23L644&sa=D&usg=AFQjCNHO11kZzirk-m3fBYZOLcnFu4pYww]
[3] [https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L650|https://www.google.com/url?q=https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py%23L650&sa=D&usg=AFQjCNFBEbFsVZ26dgfM2GQNstLlXieTWA]
[4] [https://github.com/apache/beam/blob/207d02be26eea98d0cac344bf9bd5f3a17c35282/sdks/python/apache_beam/io/gcp/bigquery.py#L1316|https://www.google.com/url?q=https://github.com/apache/beam/blob/207d02be26eea98d0cac344bf9bd5f3a17c35282/sdks/python/apache_beam/io/gcp/bigquery.py%23L1316&sa=D&usg=AFQjCNHnNIhnfpZ7tYmV8oCRuOLnjlj8uw]
[5] [https://github.com/apache/beam/blob/207d02be26eea98d0cac344bf9bd5f3a17c35282/sdks/python/apache_beam/io/gcp/bigquery.py#L1329|https://www.google.com/url?q=https://github.com/apache/beam/blob/207d02be26eea98d0cac344bf9bd5f3a17c35282/sdks/python/apache_beam/io/gcp/bigquery.py%23L1329&sa=D&usg=AFQjCNGdG3f_NmkEbBqoYopidSLyR8CLpA]

> BigQuery sink swallows HttpErrors when performing streaming inserts preventing retries
> --------------------------------------------------------------------------------------
>
>                 Key: BEAM-12362
>                 URL: https://issues.apache.org/jira/browse/BEAM-12362
>             Project: Beam
>          Issue Type: Bug
>          Components: io-java-gcp
>            Reporter: Chamikara Madhusanka Jayalath
>            Assignee: Chamikara Madhusanka Jayalath
>            Priority: P1
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Relavent code is here: [https://github.com/apache/beam/blob/158e177adc987cdc35c6a58b5292b4999c2139c3/sdks/python/apache_beam/io/gcp/bigquery_tools.py#L644]
> Only affects Python BigQuery sink when writing using streaming inserts.
> Seems like this was introduced in [https://github.com/apache/beam/pull/13217] hence Beam versions 2.27.0 and later are affected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)