You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "tomlynchRNA (via GitHub)" <gi...@apache.org> on 2023/02/06 05:33:24 UTC
[GitHub] [beam] tomlynchRNA opened a new pull request, #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
tomlynchRNA opened a new pull request, #25325:
URL: https://github.com/apache/beam/pull/25325
Ref #25225
------------------------
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
- [x] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
- [ ] ~~Update `CHANGES.md` with noteworthy changes.~~
- [ ] ~~If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).~~
See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/get-started-contributing/#make-the-reviewers-job-easier).
To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
GitHub Actions Tests Status (on master branch)
------------------------------------------------------------------------------------------------
[![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
[![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
[![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
[![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI.
cc @ragyibrahim
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421561183
@tomlynchRNA this LGTM but before we can merge this, can you write up a test to confirm these changes work as intended?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1497416166
waiting on author
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1097324853
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -405,6 +405,8 @@ def chain_after(result):
from apache_beam.utils.annotations import deprecated
from apache_beam.utils.annotations import experimental
+from google.api_core.exceptions import ClientError, GoogleAPICallError
Review Comment:
Current tests are failing with `ModuleNotFoundError: No module named 'google.api_core'`. This import should be in the `try:` block on line 410
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102127074
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
Review Comment:
```suggestion
_KNOWN_TABLES.remove(destination)
self._create_table_if_needed(bigquery_tools.parse_table_reference(destination), self.schema)
```
We will actually need to create the table again here, otherwise we will continue running into the 404 error.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1448079838
Reminder, please take a look at this pr: @jrmccluskey @ahmedabu98
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1497388292
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @damccorm for label python.
R: @pabloem for label io.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1666845946
This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
Review Comment:
> I believe after raising, the bundle will retry, _create_table_if_needed is called again
Ahh you're right, ignore my suggestion.
> I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
The retry strategy refers to errors we receive when inserting individual rows (e.g. schema mismatch) and they come after BQ tries inserting in the table. Those failed row insertions may be retried according to the strategy (and that logic is handled directly in this file), but the errors don't cause the whole bundle to fail.
The error we're looking at here is from the HTTP request (the table itself doesn't exist). This error will cause the bundle to fail and it's up to the runner to decide how it deals with this (DirectRunner fails the pipeline, DataflowRunner 3 times then fails the pipeline).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] Abacn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423496431
Yeah it's a known issue. See this comment to fix:
https://github.com/apache/beam/issues/22742#issuecomment-1218216468
@ahmedabu98 uses linux and it's fine. I have an M1 mac experiencing same issue. The problem is that protobuf installed from wheel missing some internal methods used by beam
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1491837708
Reminder, please take a look at this pr: @tvalentyn @chamikaramj
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ragyibrahim commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102095271
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+ "This sometimes occurs due to the table being deleted while a streaming job is running and "
+ "the destination was previously added to the _KNOWN_TABLES".format(destination))
Review Comment:
@ahmedabu98 addressed all the above in the latest commit [88205fa](https://github.com/apache/beam/pull/25325/commits/88205faef4436d4139f4ae792d8688639ac3a6b2)
Hopefully, that should do it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1418563015
Assigning reviewers. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @jrmccluskey for label python.
R: @ahmedabu98 for label io.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
The PR bot will only process comments in the main thread (not review comments).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1420754662
Run Python 3.7 PostCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] Abacn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423087776
I never got tox yapf working locally. I am doing this to fix lint:
```
# Run from root beam repo dir
pip install yapf==0.29.0
git diff master --name-only | grep "\.py$" | xargs yapf --in-place
```
hopefully it works
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423626173
@Abacn @ahmedabu98
Finally worked. See output below:
```bash
tox -e py3-yapf
.pkg: _optional_hooks> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: prepare_metadata_for_build_wheel> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
py3-yapf: install_package> target/.tox/py3-yapf/bin/python /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/bin/pip install --retries 10 --force-reinstall --no-deps /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/.tmp/package/16/apache-beam-2.46.0.dev0.tar.gz
py3-yapf: commands_pre[0]> python --version
Python 3.10.9
py3-yapf: commands_pre[1]> pip --version
pip 23.0 from /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/lib/python3.10/site-packages/pip (python 3.10)
py3-yapf: commands_pre[2]> pip check
No broken requirements found.
py3-yapf: commands_pre[3]> bash /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/scripts/run_tox_cleanup.sh
py3-yapf: commands[0]> yapf --version
yapf 0.29.0
py3-yapf: commands[1]> time yapf --in-place --parallel --recursive apache_beam
11.89 real 125.12 user 1.01 sys
py3-yapf: commands_post[0]> bash /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/scripts/run_tox_cleanup.sh
.pkg: _exit> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
py3-yapf: OK (17.31=setup[4.82]+cmd[0.01,0.14,0.36,0.03,0.04,11.89,0.03] seconds)
congratulations :) (17.36 seconds)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1437693899
Thanks for the contribution but I'm not sure I agree with the assumptions of the corresponding issue. Added a comment here: https://github.com/apache/beam/issues/25225#issuecomment-1437693540
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1451773222
Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment `assign to next reviewer`:
R: @tvalentyn for label python.
R: @chamikaramj for label io.
Available commands:
- `stop reviewer notifications` - opt out of the automated review tooling
- `remind me after tests pass` - tag the comment author after tests pass
- `waiting on author` - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] chamikaramj commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "chamikaramj (via GitHub)" <gi...@apache.org>.
chamikaramj commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1131734190
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.warning(
+ """Table %d was not found.
+ Table will be removed from _KNOWN_TABLES and bundle will retry.
+ This sometimes occurs due to the table being deleted while a
+ streaming job is running and the destination was previously
+ added to the _KNOWN_TABLES set"""
+ %destination)
+ raise
Review Comment:
I think it should be safe to just raise here. The runner should either re-try or fail the pipeline.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421947099
Hey, @ahmedabu98 I've tried running the commands to check for linting changes that need to be made, however, I get the below error:
```bash
tox -e py3-yapf && ../../gradlew lint
.pkg: _optional_hooks> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: get_requires_for_build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: prepare_metadata_for_build_wheel> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
.pkg: build_sdist> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
py3-yapf: install_package> target/.tox/py3-yapf/bin/python /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/bin/pip install --retries 10 --force-reinstall --no-deps /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/.tmp/package/4/apache-beam-2.46.0.dev0.tar.gz
Processing ./target/.tox/.tmp/package/4/apache-beam-2.46.0.dev0.tar.gz
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [10 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/tmp/pip-req-build-seoqvdlv/setup.py", line 180, in <module>
generate_protos_first()
File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/tmp/pip-req-build-seoqvdlv/setup.py", line 150, in generate_protos_first
gen_protos.generate_proto_files()
File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/tmp/pip-req-build-seoqvdlv/gen_protos.py", line 476, in generate_proto_files
raise RuntimeError(error_msg)
RuntimeError: Not in apache git tree, unable to find proto definitions.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
py3-yapf: exit 1 (0.61 seconds) /Users/ragy/Documents/RNA/GitHub/beam/sdks/python> target/.tox/py3-yapf/bin/python /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/py3-yapf/bin/pip install --retries 10 --force-reinstall --no-deps /Users/ragy/Documents/RNA/GitHub/beam/sdks/python/target/.tox/.tmp/package/4/apache-beam-2.46.0.dev0.tar.gz pid=80407
.pkg: _exit> python /opt/homebrew/lib/python3.10/site-packages/pyproject_api/_backend.py True setuptools.build_meta __legacy__
py3-yapf: FAIL code 1 (1.78 seconds)
evaluation failed :( (1.85 seconds)
```
I checked the issues and it seems that this was a problem in apache-beam 2.43 and was fixed. Please see the issue: https://github.com/apache/beam/issues/23227
Not sure how to get past this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102126253
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.warning(
+ """Table %d was not found.
+ Table will be removed from _KNOWN_TABLES and bundle will retry.
+ This sometimes occurs due to the table being deleted while a
+ streaming job is running and the destination was previously
+ added to the _KNOWN_TABLES set"""
+ %destination)
Review Comment:
```suggestion
_LOGGER.warning(
"Table %s was not found. "
"Table will be removed from local cache and bundle will retry. "
"This sometimes occurs due to the table being deleted while a "
"streaming job is running.",
destination)
```
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.warning(
+ """Table %d was not found.
+ Table will be removed from _KNOWN_TABLES and bundle will retry.
+ This sometimes occurs due to the table being deleted while a
+ streaming job is running and the destination was previously
+ added to the _KNOWN_TABLES set"""
+ %destination)
+ raise
Review Comment:
```suggestion
return self._flush_batch(destination)
raise
```
If we are in the 404 case, we should retry flushing the batch again. This may already be the case with Dataflow workers retrying the bundle, but I'm not sure if this applies to other runners. Python portable runner will exit instead of retrying.
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
Review Comment:
```suggestion
_KNOWN_TABLES.remove(destination)
self._create_table_if_needed(bigquery_tools.parse_table_reference(destination), self.schema)
```
We will actually need to create the table again manually.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1436898057
Reminder, please take a look at this pr: @jrmccluskey @ahmedabu98
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] closed pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
URL: https://github.com/apache/beam/pull/25325
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1476587314
Hey @tomlynchRNA, my [first two change requests](https://github.com/apache/beam/pull/25325#pullrequestreview-1292176699) still need to be addressed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
Review Comment:
> I believe after raising, the bundle will retry, _create_table_if_needed is called again
Ahh you're right, ignore my suggestion.
> I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
The retry strategy refers to errors we receive when inserting individual rows (e.g. schema mismatch) and they come after BQ tries inserting in the table. Those failed row insertions may be retried according to the strategy (and that logic is handled directly in this file), but the errors don't cause the whole bundle to fail.
The error we're looking at here is from the HTTP request (the table itself doesn't exist). This error will cause the bundle to fail and it's up to the runner to decide how it deals with this (e.g. DirectRunner fails the pipeline; DataflowRunner retries a failed bundle 3 times then fails the pipeline).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1454059720
.waiting on author
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "tvalentyn (via GitHub)" <gi...@apache.org>.
tvalentyn commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1505819199
Hi @tomlynchRNA, just checking - do you need any further assistance on this PR or the feedback is clear. Thanks for your contribution!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1677242870
This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1101712082
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+ "This sometimes occurs due to the table being deleted while a streaming job is running and "
+ "the destination was previously added to the _KNOWN_TABLES".format(destination))
Review Comment:
I think this exception warrants a `_LOGGER.warning` level log.
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+ "This sometimes occurs due to the table being deleted while a streaming job is running and "
+ "the destination was previously added to the _KNOWN_TABLES".format(destination))
Review Comment:
Also it's recommended to use printf-style `%s`. ie. _LOGGER.info("Table %s was ...", destination)
This optimizes by postponing the formatting to when the message is outputted.
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,26 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.info("Table {} was not found. Table will be removed from _KNOWN_TABLES and bundle will retry. "
+ "This sometimes occurs due to the table being deleted while a streaming job is running and "
+ "the destination was previously added to the _KNOWN_TABLES".format(destination))
Review Comment:
Lint check says these lines are too long, try breaking them down to more lines.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1102126253
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
+ _LOGGER.warning(
+ """Table %d was not found.
+ Table will be removed from _KNOWN_TABLES and bundle will retry.
+ This sometimes occurs due to the table being deleted while a
+ streaming job is running and the destination was previously
+ added to the _KNOWN_TABLES set"""
+ %destination)
Review Comment:
```suggestion
_LOGGER.warning(
"Table %s was not found. Will remove table from local cache"
"and recreate it in BigQuery. The bundle will retry afterwards. "
"This sometimes occurs due to the table being deleted while a "
"streaming job is running.",
destination)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146050389
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
Review Comment:
> I believe after raising, the bundle will retry, _create_table_if_needed is called again
Ahh you're right, ignore my suggestion.
> I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
The retry strategy refers to errors we receive when inserting individual rows (e.g. schema mismatch) and they come after BQ tries inserting in the table. Those failed row insertions may be retried according to the strategy (and that logic is handled directly in this file), but the errors don't cause the whole bundle to fail.
The error we're looking at here is from the HTTP request (the table itself doesn't exist). This error will cause the bundle to fail and it's up to the runner to decide how it deals with this (e.g. DirectRunner and DataflowRunner retry a failed bundle 3 times then fail the pipeline).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1146127690
##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -1082,6 +1082,39 @@ def test_insert_all_unretriable_errors(
},
create_disposition='CREATE_NEVER',
method='STREAMING_INSERTS'))
+ self.assertEqual(1, mock_send.call_count)\
+
+ @mock.patch('time.sleep')
+ @mock.patch('google.cloud.bigquery.Client.insert_rows_json')
+ def test_insert_all_table_not_found_errors(
Review Comment:
I don't think this actually tests the changes you're making. It mocks `insert_rows_json` but doesn't return a 404 error.
An integration test is probably more appropriate for these changes (instead of writing to a dummy table). And we should have a check at the beginning of the test to make sure we're running with TestDataflowRunner:
```
if not isinstance(self.test_pipeline.runner, TestDataflowRunner):
self.skipTest("This test will invoke a bundle retry, which needs TestDataflowRunner")
```
##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -1082,6 +1082,39 @@ def test_insert_all_unretriable_errors(
},
create_disposition='CREATE_NEVER',
method='STREAMING_INSERTS'))
+ self.assertEqual(1, mock_send.call_count)\
+
+ @mock.patch('time.sleep')
+ @mock.patch('google.cloud.bigquery.Client.insert_rows_json')
+ def test_insert_all_table_not_found_errors(
Review Comment:
The approach you take in this test looks good (adding to KNOWN_TABLES before the write), let's do that with a real write to BQ. I suggest moving this test to `BigQueryStreamingInsertTransformIntegrationTests` and make use of the output table created in `setUp`:
https://github.com/apache/beam/blob/92abdf636c52cce7c067593000bc846f507b0595/sdks/python/apache_beam/io/gcp/bigquery_test.py#L1606-L1620
##########
sdks/python/apache_beam/io/gcp/bigquery_test.py:
##########
@@ -1082,6 +1082,39 @@ def test_insert_all_unretriable_errors(
},
create_disposition='CREATE_NEVER',
method='STREAMING_INSERTS'))
+ self.assertEqual(1, mock_send.call_count)\
+
+ @mock.patch('time.sleep')
+ @mock.patch('google.cloud.bigquery.Client.insert_rows_json')
+ def test_insert_all_table_not_found_errors(
Review Comment:
P.S. you can use `BigqueryFullResultMatcher` to make sure the data arrived at the table correctly. Example below:
https://github.com/apache/beam/blob/92abdf636c52cce7c067593000bc846f507b0595/sdks/python/apache_beam/io/gcp/bigquery_test.py#L1657-L1661
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1097324853
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -405,6 +405,8 @@ def chain_after(result):
from apache_beam.utils.annotations import deprecated
from apache_beam.utils.annotations import experimental
+from google.api_core.exceptions import ClientError, GoogleAPICallError
Review Comment:
Current tests are failing with `ModuleNotFoundError: No module named 'google.api_core'`. This import should be in the try block on line 410
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404:
Review Comment:
Also maybe a comment here describing that sometimes a table can get deleted in the middle of a streaming job.
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404:
Review Comment:
Can you also add a helpful log message here explaining that the previously seen destination X no longer exists, so it will be removed from local cache and bundle will retry.
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1553,23 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404:
Review Comment:
```suggestion
if e.code == 404 and destination in _KNOWN_TABLES:
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421559650
Python 3.7 postcommits failing due to #25359
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1420755353
Run Python 3.10 PostCommit
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1420754836
Run Python 3.10 PostCommit
--
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ahmedabu98 commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1421563192
Also please fix formatting errors; you can go to `sdks/python` and run `tox -e py3-yapf && ../../gradlew lint` and see what needs fixing. Check [Python Tips](https://cwiki.apache.org/confluence/display/BEAM/Python+Tips#PythonTips-LintandFormattingChecks) for more details
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] codecov[bot] commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "codecov[bot] (via GitHub)" <gi...@apache.org>.
codecov[bot] commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1418551736
# [Codecov](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
> Merging [#25325](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (be9c95b) into [master](https://codecov.io/gh/apache/beam/commit/067e4db747b04adf517f12d951176827d55b4b09?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (067e4db) will **decrease** coverage by `0.02%`.
> The diff coverage is `88.88%`.
```diff
@@ Coverage Diff @@
## master #25325 +/- ##
==========================================
- Coverage 72.96% 72.94% -0.02%
==========================================
Files 743 745 +2
Lines 99037 99182 +145
==========================================
+ Hits 72266 72353 +87
- Misses 25405 25463 +58
Partials 1366 1366
```
| Flag | Coverage Δ | |
|---|---|---|
| python | `82.45% <88.88%> (-0.05%)` | :arrow_down: |
Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#carryforward-flags-in-the-pull-request-comment) to find out more.
| [Impacted Files](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
|---|---|---|
| [sdks/python/apache\_beam/io/gcp/bigquery.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vaW8vZ2NwL2JpZ3F1ZXJ5LnB5) | `71.49% <88.88%> (+0.14%)` | :arrow_up: |
| [.../apache\_beam/runners/interactive/dataproc/types.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9pbnRlcmFjdGl2ZS9kYXRhcHJvYy90eXBlcy5weQ==) | `93.10% <0.00%> (-3.45%)` | :arrow_down: |
| [sdks/python/apache\_beam/utils/interactive\_utils.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vdXRpbHMvaW50ZXJhY3RpdmVfdXRpbHMucHk=) | `95.12% <0.00%> (-2.44%)` | :arrow_down: |
| [...hon/apache\_beam/runners/worker/bundle\_processor.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy93b3JrZXIvYnVuZGxlX3Byb2Nlc3Nvci5weQ==) | `93.42% <0.00%> (-0.13%)` | :arrow_down: |
| [...thon/apache\_beam/ml/inference/sklearn\_inference.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3NrbGVhcm5faW5mZXJlbmNlLnB5) | `96.25% <0.00%> (-0.05%)` | :arrow_down: |
| [...thon/apache\_beam/ml/inference/pytorch\_inference.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3B5dG9yY2hfaW5mZXJlbmNlLnB5) | `0.00% <0.00%> (ø)` | |
| [...hon/apache\_beam/ml/inference/tensorrt\_inference.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3RlbnNvcnJ0X2luZmVyZW5jZS5weQ==) | `0.00% <0.00%> (ø)` | |
| [sdks/python/apache\_beam/ml/inference/utils.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vbWwvaW5mZXJlbmNlL3V0aWxzLnB5) | `100.00% <0.00%> (ø)` | |
| [...am/examples/inference/run\_inference\_side\_inputs.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vZXhhbXBsZXMvaW5mZXJlbmNlL3J1bl9pbmZlcmVuY2Vfc2lkZV9pbnB1dHMucHk=) | `35.00% <0.00%> (ø)` | |
| [sdks/python/apache\_beam/runners/common.py](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-c2Rrcy9weXRob24vYXBhY2hlX2JlYW0vcnVubmVycy9jb21tb24ucHk=) | `88.71% <0.00%> (+0.12%)` | :arrow_up: |
| ... and [2 more](https://codecov.io/gh/apache/beam/pull/25325?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
:mega: We’re building smart automated test selection to slash your CI/CD build times. [Learn more](https://about.codecov.io/iterative-testing/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423492870
hi, @Abacn, unfortunately, got the same error still.
I've tried running the `gen_protos.py` file and I get the below error message (FYI im running an M2 mac)
```bash
org/apache/beam/model/interactive/v1/beam_interactive_api.proto:36:1: warning: Import google/protobuf/timestamp.proto is unused.
Writing mypy to org/apache/beam/model/pipeline/v1/external_transforms_pb2.pyi
Writing mypy to org/apache/beam/model/pipeline/v1/schema_pb2.pyi
Writing mypy to org/apache/beam/model/pipeline/v1/metrics_pb2.pyi
Writing mypy to org/apache/beam/model/pipeline/v1/endpoints_pb2.pyi
Writing mypy to org/apache/beam/model/pipeline/v1/beam_runner_api_pb2.pyi
Writing mypy to org/apache/beam/model/pipeline/v1/standard_window_fns_pb2.pyi
Writing mypy to org/apache/beam/model/job_management/v1/beam_artifact_api_pb2.pyi
Writing mypy to org/apache/beam/model/job_management/v1/beam_expansion_api_pb2.pyi
Writing mypy to org/apache/beam/model/job_management/v1/beam_job_api_pb2.pyi
Writing mypy to org/apache/beam/model/fn_execution/v1/beam_fn_api_pb2.pyi
Writing mypy to org/apache/beam/model/fn_execution/v1/beam_provision_api_pb2.pyi
Writing mypy to org/apache/beam/model/interactive/v1/beam_interactive_api_pb2.pyi
Traceback (most recent call last):
File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/gen_protos.py", line 561, in <module>
generate_proto_files(force=True)
File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/gen_protos.py", line 555, in generate_proto_files
generate_urn_files(proto_package, PYTHON_OUTPUT_PATH)
File "/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/gen_protos.py", line 126, in generate_urn_files
import google.protobuf.pyext._message as pyext_message
ImportError: dlopen(/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so, 0x0002): tried: '/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64')), '/System/Volumes/Preboot/Cryptexes/OS/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so' (no such file), '/Users/ragy/Documents/RNA/GitHub/beam/sdks/python/.venv/lib/python3.10/site-packages/google/protobuf/pyext/_message.cpython-310-darwin.so' (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64'))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1423498250
@ahmedabu98 as you suggested i tried running the below but still got the same error
```bash
pip install --no-binary :all: grpcio --ignore-installed
pip install --no-binary :all: grpcio-tools --ignore-installed
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ragyibrahim commented on pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "ragyibrahim (via GitHub)" <gi...@apache.org>.
ragyibrahim commented on PR #25325:
URL: https://github.com/apache/beam/pull/25325#issuecomment-1580048341
Hi @tvalentyn @ahmedabu98 yeah can you let us know what tests need to be written so we can complete this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tomlynchRNA commented on a diff in pull request #25325: Remove from _KNOWN_TABLES on 404 insert to allow table re-creation for Python SDK
Posted by "tomlynchRNA (via GitHub)" <gi...@apache.org>.
tomlynchRNA commented on code in PR #25325:
URL: https://github.com/apache/beam/pull/25325#discussion_r1142942877
##########
sdks/python/apache_beam/io/gcp/bigquery.py:
##########
@@ -1551,17 +1552,30 @@ def _flush_batch(self, destination):
insert_ids = [None for r in rows_and_insert_ids]
else:
insert_ids = [r[1] for r in rows_and_insert_ids]
-
while True:
+ errors = []
+ passed = False
start = time.time()
- passed, errors = self.bigquery_wrapper.insert_rows(
- project_id=table_reference.projectId,
- dataset_id=table_reference.datasetId,
- table_id=table_reference.tableId,
- rows=rows,
- insert_ids=insert_ids,
- skip_invalid_rows=True,
- ignore_unknown_values=self.ignore_unknown_columns)
+ try:
+ passed, errors = self.bigquery_wrapper.insert_rows(
+ project_id=table_reference.projectId,
+ dataset_id=table_reference.datasetId,
+ table_id=table_reference.tableId,
+ rows=rows,
+ insert_ids=insert_ids,
+ skip_invalid_rows=True,
+ ignore_unknown_values=self.ignore_unknown_columns)
+ except (ClientError, GoogleAPICallError) as e:
+ if e.code == 404 and destination in _KNOWN_TABLES:
+ _KNOWN_TABLES.remove(destination)
Review Comment:
Can you clarify? I believe after raising, the bundle will retry, _create_table_if_needed is called again and will pass the check for the table name being in KNOWN_TABLES. Then the table may be created again (depending on create_disposition).
I've also realised, if the create_disposition is 'CREATE_NEVER' and the insert_retry_strategy is set to 'RETRY_ALWAYS', the pipeline will be stuck in a loop, no?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org