You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "tomlynchRNA (via GitHub)" <gi...@apache.org> on 2023/02/06 05:33:24 UTC

[GitHub] [beam] tomlynchRNA opened a new pull request, #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

tomlynchRNA opened a new pull request, #25325:
URL: https://github.com/apache/beam/pull/25325

   Ref #25225 
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] ~~Update `CHANGES.md` with noteworthy changes.~~
    - [ ] ~~If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).~~
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
   
   cc @ragyibrahim


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421561183

   @tomlynchRNA this LGTM but before we can merge this, can you write up a test to confirm these changes work as intended?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1497416166

   waiting on author


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1097324853


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -405,6 +405,8 @@ def chain_after(result):
 from apache_beam.utils.annotations import deprecated
 from apache_beam.utils.annotations import experimental
 
+from google.api_core.exceptions import ClientError, GoogleAPICallError

Review Comment:
   Current tests are failing with `ModuleNotFoundError: No module named 'google.api_core'`. This import should be in the `try:` block on line 410



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102127074


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   ```suggestion
             _KNOWN_TABLES.remove(destination)
             self._create_table_if_needed(bigquery_tools.parse_table_reference(destination), self.schema)
   ```
   
   We will actually need to create the table again here, otherwise we will continue running into the 404 error.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1448079838

   Reminder, please take a look at this pr: @jrmccluskey @ahmedabu98 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1497388292

   Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @damccorm for label python.
   R: @pabloem for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1666845946

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   > I believe after raising, the bundle will retry, _create_table_if_needed is called again
   
   Ahh you're right, ignore my suggestion.
   
   > I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
   
   The retry strategy refers to errors we receive when inserting individual rows (e.g. schema mismatch) and they come after BQ tries inserting in the table. Those failed row insertions may be retried according to the strategy (and that logic is handled directly in this file), but the errors don't cause the whole bundle to fail.
   
   The error we're looking at here is from the HTTP request (the table itself doesn't exist). This error will cause the bundle to fail and it's up to the runner to decide how it deals with this (DirectRunner fails the pipeline, DataflowRunner 3 times then fails the pipeline).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423496431

   Yeah it's a known issue. See this comment to fix:
   
   https://github.com/apache/beam/issues/22742#issuecomment-1218216468
   
   @ahmedabu98 uses linux and it's fine. I have an M1 mac experiencing same issue. The problem is that protobuf installed from wheel missing some internal methods used by beam


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1491837708

   Reminder, please take a look at this pr: @tvalentyn @chamikaramj 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ragyibrahim commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102095271


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+                        "This sometimes occurs due to the table being deleted while a streaming job is running and "
+                        "the destination was previously added to the _KNOWN_TABLES".format(destination))

Review Comment:
   @ahmedabu98 addressed all the above in the latest commit [88205fa](https://github.com/apache/beam/pull/25325/commits/88205faef4436d4139f4ae792d8688639ac3a6b2)
   Hopefully, that should do it 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1418563015

   Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @jrmccluskey for label python.
   R: @ahmedabu98 for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
   
   The PR bot will only process comments in the main thread (not review comments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1420754662

   Run Python 3.7 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423087776

   I never got tox yapf working locally. I am doing this to fix lint:
   
   ```
   # Run from root beam repo dir
   pip install yapf==0.29.0
   git diff master --name-only | grep "\.py$" | xargs yapf --in-place
   ```
   
   hopefully it works


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423626173

   @Abacn @ahmedabu98 
   Finally worked. See output below:
   
   ```bash
   tox -e py3-yapf
   .pkg: _optional_hooks> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   .pkg: get_requires_for_build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   .pkg: prepare_metadata_for_build_wheel> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   .pkg: build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   py3-yapf: install_package> target/.tox/py3-yapf/bin/python /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/bin/pip install --retries 10 --force-reinstall --no-deps /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/.tmp/package/16/apache-beam-2.46.0.dev0.tar.gz
   py3-yapf: commands_pre[0]> python --version
   Python 3.10.9
   py3-yapf: commands_pre[1]> pip --version
   pip 23.0 from /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/lib/python3.10/site-packages/pip (python 3.10)
   py3-yapf: commands_pre[2]> pip check
   No broken requirements found.
   py3-yapf: commands_pre[3]> bash /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/scripts/run_tox_cleanup.sh
   py3-yapf: commands[0]> yapf --version
   yapf 0.29.0
   py3-yapf: commands[1]> time yapf --in-place --parallel --recursive apache_beam
          11.89 real       125.12 user         1.01 sys
   py3-yapf: commands_post[0]> bash /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/scripts/run_tox_cleanup.sh
   .pkg: _exit> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
     py3-yapf: OK (17.31=setup[4.82]+cmd[0.01,0.14,0.36,0.03,0.04,11.89,0.03] seconds)
     congratulations :) (17.36 seconds)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1437693899

   Thanks for the contribution but I'm not sure I agree with the assumptions of the corresponding issue. Added a comment here: https://github.com/apache/beam/issues/25225#issuecomment-1437693540


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1451773222

   Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment `assign to next reviewer`:
   
   R: @tvalentyn for label python.
   R: @chamikaramj for label io.
   
   Available commands:
   - `stop reviewer notifications` - opt out of the automated review tooling
   - `remind me after tests pass` - tag the comment author after tests pass
   - `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] chamikaramj commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1131734190


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.warning(
+              """Table %d was not found.
+              Table will be removed from _KNOWN_TABLES and bundle will retry.
+              This sometimes occurs due to the table being deleted while a 
+              streaming job is running and the destination was previously 
+              added to the _KNOWN_TABLES set"""
+              %destination)
+        raise

Review Comment:
   I think it should be safe to just raise here. The runner should either re-try or fail the pipeline.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421947099

   Hey, @ahmedabu98 I've tried running the commands to check for linting changes that need to be made, however, I get the below error: 
   
   ```bash
   tox -e py3-yapf && ../../gradlew lint
   .pkg: _optional_hooks> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   .pkg: get_requires_for_build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   .pkg: prepare_metadata_for_build_wheel> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   .pkg: build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
   py3-yapf: install_package> target/.tox/py3-yapf/bin/python /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/bin/pip install --retries 10 --force-reinstall --no-deps /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/.tmp/package/4/apache-beam-2.46.0.dev0.tar.gz
   Processing ./target/.tox/.tmp/package/4/apache-beam-2.46.0.dev0.tar.gz
     Preparing metadata (setup.py) ... error
   
     error: subprocess-exited-with-error
   
     × python setup.py egg_info did not run successfully.
     │ exit code: 1
     ╰─> [10 lines of output]
         Traceback (most recent call last):
           File "<string>", line 2, in <module>
           File "<pip-setuptools-caller>", line 34, in <module>
           File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/tmp/pip-req-build-seoqvdlv/setup.py", line 180, in <module>
             generate_protos_first()
           File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/tmp/pip-req-build-seoqvdlv/setup.py", line 150, in generate_protos_first
             gen_protos.generate_proto_files()
           File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/tmp/pip-req-build-seoqvdlv/gen_protos.py", line 476, in generate_proto_files
             raise RuntimeError(error_msg)
         RuntimeError: Not in apache git tree, unable to find proto definitions.
         [end of output]
   
     note: This error originates from a subprocess, and is likely not a problem with pip.
   error: metadata-generation-failed
   
   × Encountered error while generating package metadata.
   ╰─> See above for output.
   
   note: This is an issue with the package mentioned above, not pip.
   hint: See above for details.
   py3-yapf: exit 1 (0.61 seconds) /Users/ragy/Documents/RNA/GitHub/beam/sdks/python> target/.tox/py3-yapf/bin/python /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/bin/pip install --retries 10 --force-reinstall --no-deps /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/.tmp/package/4/apache-beam-2.46.0.dev0.tar.gz pid=80407
   .pkg: _exit> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
     py3-yapf: FAIL code 1 (1.78 seconds)
     evaluation failed :( (1.85 seconds)
   ```
   
   I checked the issues and it seems that this was a problem in apache-beam 2.43 and was fixed. Please see the issue: https://github.com/apache/beam/issues/23227
   
   Not sure how to get past this
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102126253


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.warning(
+              """Table %d was not found.
+              Table will be removed from _KNOWN_TABLES and bundle will retry.
+              This sometimes occurs due to the table being deleted while a 
+              streaming job is running and the destination was previously 
+              added to the _KNOWN_TABLES set"""
+              %destination)

Review Comment:
   ```suggestion
             _LOGGER.warning(
               "Table %s was not found. "
               "Table will be removed from local cache and bundle will retry. "
               "This sometimes occurs due to the table being deleted while a "
               "streaming job is running.",
               destination)
   ```



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.warning(
+              """Table %d was not found.
+              Table will be removed from _KNOWN_TABLES and bundle will retry.
+              This sometimes occurs due to the table being deleted while a 
+              streaming job is running and the destination was previously 
+              added to the _KNOWN_TABLES set"""
+              %destination)
+        raise

Review Comment:
   ```suggestion
             return self._flush_batch(destination)
           raise
   ```
   
   If we are in the 404 case, we should retry flushing the batch again. This may already be the case with Dataflow workers retrying the bundle, but I'm not sure if this applies to other runners. Python portable runner will exit instead of retrying.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   ```suggestion
             _KNOWN_TABLES.remove(destination)
             self._create_table_if_needed(bigquery_tools.parse_table_reference(destination), self.schema)
   ```
   
   We will actually need to create the table again manually.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1436898057

   Reminder, please take a look at this pr: @jrmccluskey @ahmedabu98 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] closed pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
URL: https://github.com/apache/beam/pull/25325


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1476587314

   Hey @tomlynchRNA, my [first two change requests](https://github.com/apache/beam/pull/25325#pullrequestreview-1292176699) still need to be addressed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   > I believe after raising, the bundle will retry, _create_table_if_needed is called again
   
   Ahh you're right, ignore my suggestion.
   
   > I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
   
   The retry strategy refers to errors we receive when inserting individual rows (e.g. schema mismatch) and they come after BQ tries inserting in the table. Those failed row insertions may be retried according to the strategy (and that logic is handled directly in this file), but the errors don't cause the whole bundle to fail.
   
   The error we're looking at here is from the HTTP request (the table itself doesn't exist). This error will cause the bundle to fail and it's up to the runner to decide how it deals with this (e.g. DirectRunner fails the pipeline; DataflowRunner retries a failed bundle 3 times then fails the pipeline).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1454059720

   .waiting on author


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1505819199

   Hi @tomlynchRNA, just checking - do you need any further assistance on this PR or the feedback is clear. Thanks for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1677242870

   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1101712082


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+                        "This sometimes occurs due to the table being deleted while a streaming job is running and "
+                        "the destination was previously added to the _KNOWN_TABLES".format(destination))

Review Comment:
   I think this exception warrants a `_LOGGER.warning` level log. 



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+                        "This sometimes occurs due to the table being deleted while a streaming job is running and "
+                        "the destination was previously added to the _KNOWN_TABLES".format(destination))

Review Comment:
   Also it's recommended to use printf-style `%s`. ie.  _LOGGER.info("Table %s was ...", destination)
   
   This optimizes by postponing the formatting to when the message is outputted.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+                        "This sometimes occurs due to the table being deleted while a streaming job is running and "
+                        "the destination was previously added to the _KNOWN_TABLES".format(destination))

Review Comment:
   Lint check says these lines are too long, try breaking them down to more lines.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102126253


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)
+          _LOGGER.warning(
+              """Table %d was not found.
+              Table will be removed from _KNOWN_TABLES and bundle will retry.
+              This sometimes occurs due to the table being deleted while a 
+              streaming job is running and the destination was previously 
+              added to the _KNOWN_TABLES set"""
+              %destination)

Review Comment:
   ```suggestion
             _LOGGER.warning(
               "Table %s was not found. Will remove table from local cache"
               "and recreate it in BigQuery. The bundle will retry afterwards. "
               "This sometimes occurs due to the table being deleted while a "
               "streaming job is running.",
               destination)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   > I believe after raising, the bundle will retry, _create_table_if_needed is called again
   
   Ahh you're right, ignore my suggestion.
   
   > I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
   
   The retry strategy refers to errors we receive when inserting individual rows (e.g. schema mismatch) and they come after BQ tries inserting in the table. Those failed row insertions may be retried according to the strategy (and that logic is handled directly in this file), but the errors don't cause the whole bundle to fail.
   
   The error we're looking at here is from the HTTP request (the table itself doesn't exist). This error will cause the bundle to fail and it's up to the runner to decide how it deals with this (e.g. DirectRunner and DataflowRunner retry a failed bundle 3 times then fail the pipeline).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146127690


##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -1082,6 +1082,39 @@ def test_insert_all_unretriable_errors(
                 },
                 create_disposition='CREATE_NEVER',
                 method='STREAMING_INSERTS'))
+    self.assertEqual(1, mock_send.call_count)\
+
+  @mock.patch('time.sleep')
+  @mock.patch('google.cloud.bigquery.Client.insert_rows_json')
+  def test_insert_all_table_not_found_errors(

Review Comment:
   I don't think this actually tests the changes you're making. It mocks `insert_rows_json` but doesn't return a 404 error. 
   
   An integration test is probably more appropriate for these changes (instead of writing to a dummy table). And we should have a check at the beginning of the test to make sure we're running with TestDataflowRunner: 
   ```
   if not isinstance(self.test_pipeline.runner, TestDataflowRunner):
         self.skipTest("This test will invoke a bundle retry, which needs TestDataflowRunner")
   ```



##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -1082,6 +1082,39 @@ def test_insert_all_unretriable_errors(
                 },
                 create_disposition='CREATE_NEVER',
                 method='STREAMING_INSERTS'))
+    self.assertEqual(1, mock_send.call_count)\
+
+  @mock.patch('time.sleep')
+  @mock.patch('google.cloud.bigquery.Client.insert_rows_json')
+  def test_insert_all_table_not_found_errors(

Review Comment:
   The approach you take in this test looks good (adding to KNOWN_TABLES before the write), let's do that with a real write to BQ. I suggest moving this test to `BigQueryStreamingInsertTransformIntegrationTests` and make use of the output table created in `setUp`:
   https://github.com/apache/beam/blob/92abdf636c52cce7c067593000bc846f507b0595/sdks/python/apache_beam/io/gcp/bigquery_test.py#L1606-L1620



##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -1082,6 +1082,39 @@ def test_insert_all_unretriable_errors(
                 },
                 create_disposition='CREATE_NEVER',
                 method='STREAMING_INSERTS'))
+    self.assertEqual(1, mock_send.call_count)\
+
+  @mock.patch('time.sleep')
+  @mock.patch('google.cloud.bigquery.Client.insert_rows_json')
+  def test_insert_all_table_not_found_errors(

Review Comment:
   P.S. you can use `BigqueryFullResultMatcher` to make sure the data arrived at the table correctly. Example below: 
   https://github.com/apache/beam/blob/92abdf636c52cce7c067593000bc846f507b0595/sdks/python/apache_beam/io/gcp/bigquery_test.py#L1657-L1661



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1097324853


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -405,6 +405,8 @@ def chain_after(result):
 from apache_beam.utils.annotations import deprecated
 from apache_beam.utils.annotations import experimental
 
+from google.api_core.exceptions import ClientError, GoogleAPICallError

Review Comment:
   Current tests are failing with `ModuleNotFoundError: No module named 'google.api_core'`. This import should be in the try block on line 410



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404:

Review Comment:
   Also maybe a comment here describing that sometimes a table can get deleted in the middle of a streaming job.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404:

Review Comment:
   Can you also add a helpful log message here explaining that the previously seen destination X no longer exists, so it will be removed from local cache and bundle will retry.



##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404:

Review Comment:
   ```suggestion
           if e.code == 404 and destination in  _KNOWN_TABLES:
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421559650

   Python 3.7 postcommits failing due to #25359


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1420755353

   Run Python 3.10 PostCommit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1420754836

   
   
   Run Python 3.10 PostCommit
   --
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421563192

   Also please fix formatting errors; you can go to `sdks/python` and run `tox -e py3-yapf && ../../gradlew lint` and see what needs fixing. Check [Python Tips](https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks) for more details


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] codecov[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1418551736

   # [Codecov](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#25325](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (be9c95b) into [master](https://codecov.io/gh/apache/beam/commit/067e4db747b04adf517f12d951176827d55b4b09?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (067e4db) will **decrease** coverage by `0.02%`.
   > The diff coverage is `88.88%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master   #25325      +/-   ##
   ==========================================
   - Coverage   72.96%   72.94%   -0.02%     
   ==========================================
     Files         743      745       +2     
     Lines       99037    99182     +145     
   ==========================================
   + Hits        72266    72353      +87     
   - Misses      25405    25463      +58     
     Partials     1366     1366              
   ```
   
   | Flag | Coverage Δ | |
   |---|---|---|
   | python | `82.45% <88.88%> (-0.05%)` | :arrow_down: |
   
   Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
   
   | [Impacted Files](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `71.49% <88.88%> (+0.14%)` | :arrow_up: |
   | [.../apache\_beam/runners/interactive/dataproc/types.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kYXRhcHJvYy90eXBlcy5weQ==) | `93.10% <0.00%> (-3.45%)` | :arrow_down: |
   | [sdks/python/apache\_beam/utils/interactive\_utils.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaW50ZXJhY3RpdmVfdXRpbHMucHk=) | `95.12% <0.00%> (-2.44%)` | :arrow_down: |
   | [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.42% <0.00%> (-0.13%)` | :arrow_down: |
   | [...thon/apache\_beam/ml/inference/sklearn\_inference.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3NrbGVhcm5faW5mZXJlbmNlLnB5) | `96.25% <0.00%> (-0.05%)` | :arrow_down: |
   | [...thon/apache\_beam/ml/inference/pytorch\_inference.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3B5dG9yY2hfaW5mZXJlbmNlLnB5) | `0.00% <0.00%> (ø)` | |
   | [...hon/apache\_beam/ml/inference/tensorrt\_inference.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3RlbnNvcnJ0X2luZmVyZW5jZS5weQ==) | `0.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/ml/inference/utils.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3V0aWxzLnB5) | `100.00% <0.00%> (ø)` | |
   | [...am/examples/inference/run\_inference\_side\_inputs.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvaW5mZXJlbmNlL3J1bl9pbmZlcmVuY2Vfc2lkZV9pbnB1dHMucHk=) | `35.00% <0.00%> (ø)` | |
   | [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.71% <0.00%> (+0.12%)` | :arrow_up: |
   | ... and [2 more](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   :mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423492870

   hi, @Abacn, unfortunately, got the same error still. 
   
   I've tried running the `gen_protos.py` file and I get the below error message (FYI im running an M2 mac)
   
   ```bash
   org/apache/beam/model/interactive/v1/beam_interactive_api.proto:36:1: warning: Import google/protobuf/timestamp.proto is unused.
   Writing mypy to org/apache/beam/model/pipeline/v1/external_transforms_pb2.pyi
   Writing mypy to org/apache/beam/model/pipeline/v1/schema_pb2.pyi
   Writing mypy to org/apache/beam/model/pipeline/v1/metrics_pb2.pyi
   Writing mypy to org/apache/beam/model/pipeline/v1/endpoints_pb2.pyi
   Writing mypy to org/apache/beam/model/pipeline/v1/beam_runner_api_pb2.pyi
   Writing mypy to org/apache/beam/model/pipeline/v1/standard_window_fns_pb2.pyi
   Writing mypy to org/apache/beam/model/job_management/v1/beam_artifact_api_pb2.pyi
   Writing mypy to org/apache/beam/model/job_management/v1/beam_expansion_api_pb2.pyi
   Writing mypy to org/apache/beam/model/job_management/v1/beam_job_api_pb2.pyi
   Writing mypy to org/apache/beam/model/fn_execution/v1/beam_fn_api_pb2.pyi
   Writing mypy to org/apache/beam/model/fn_execution/v1/beam_provision_api_pb2.pyi
   Writing mypy to org/apache/beam/model/interactive/v1/beam_interactive_api_pb2.pyi
   Traceback (most recent call last):
     File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/gen_protos.py", line 561, in <module>
       generate_proto_files(force=True)
     File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/gen_protos.py", line 555, in generate_proto_files
       generate_urn_files(proto_package, PYTHON_OUTPUT_PATH)
     File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/gen_protos.py", line 126, in generate_urn_files
       import google.protobuf.pyext._message as pyext_message
   ImportError: dlopen(/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so, 0x0002): tried: '/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so' (no such file), '/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423498250

   @ahmedabu98 as you suggested i tried running the below but still got the same error
   
   ```bash
   pip install --no-binary :all: grpcio --ignore-installed
   pip install --no-binary :all: grpcio-tools --ignore-installed
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1580048341

   Hi @tvalentyn @ahmedabu98  yeah can you let us know what tests need to be written so we can complete this PR. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tomlynchRNA commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK

Posted by "tomlynchRNA (via GitHub)" <gi...@apache.org>.
tomlynchRNA commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1142942877


##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
       insert_ids = [None for r in rows_and_insert_ids]
     else:
       insert_ids = [r[1] for r in rows_and_insert_ids]
-
     while True:
+      errors = []
+      passed = False
       start = time.time()
-      passed, errors = self.bigquery_wrapper.insert_rows(
-          project_id=table_reference.projectId,
-          dataset_id=table_reference.datasetId,
-          table_id=table_reference.tableId,
-          rows=rows,
-          insert_ids=insert_ids,
-          skip_invalid_rows=True,
-          ignore_unknown_values=self.ignore_unknown_columns)
+      try:
+        passed, errors = self.bigquery_wrapper.insert_rows(
+              project_id=table_reference.projectId,
+              dataset_id=table_reference.datasetId,
+              table_id=table_reference.tableId,
+              rows=rows,
+              insert_ids=insert_ids,
+              skip_invalid_rows=True,
+              ignore_unknown_values=self.ignore_unknown_columns)
+      except (ClientError, GoogleAPICallError) as e:
+        if e.code == 404 and destination in _KNOWN_TABLES:
+          _KNOWN_TABLES.remove(destination)

Review Comment:
   Can you clarify? I believe after raising, the bundle will retry, _create_table_if_needed is called again and will pass the check for the table name being in KNOWN_TABLES. Then the table may be created again (depending on create_disposition).
   
   I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org