You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 19:03:07 UTC
[GitHub] [beam] damccorm opened a new issue, #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
damccorm opened a new issue, #20717:
URL: https://github.com/apache/beam/issues/20717
Actual Behaviour
The apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on the exception below and the entire pipeline crashes.
Expected Behaviour
The apache_beam.io.gcp.experimental.spannerio.WriteToSpanner module handles exceptions gracefully and does not crash the pipeline.
It isn't possible to implement error-handling in pipeline code. It would be easier to just handle the exception inside the `process` function.
Please see the logs below for more information.
```
main.py:91: FutureWarning: ReadFromSpanner is experimental. No backwards-compatibility guarantees.main.py:91:
FutureWarning: ReadFromSpanner is experimental. No backwards-compatibility guarantees. sql=sqlmain.py:102:
FutureWarning: WriteToSpanner is experimental. No backwards-compatibility guarantees. database_id=importer_options.DEST_SPANNER_DATASET_ID,warning:
sdist: standard file not found: should have one of README, README.rst, README.txt, README.md
WARNING:root:Make
sure that locally built Python SDK docker image has Python 3.7 interpreter.Traceback (most recent call
last): File "main.py", line 110, in <module> run() File "main.py", line 106, in run result.wait_until_finish()
File "/home/notion/.local/share/virtualenvs/ods-to-ods-bigquery-EZMTrMjb/lib/python3.7/site-packages/apache_beam/runners/dataflow/dataflow_runner.py",
line 1665, in wait_until_finish self)apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException:
Dataflow pipeline failed. State: FAILED, Error:Traceback (most recent call last): File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 57, in error_remapped_callable return callable_(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py",
line 826, in __call__ return _end_unary_response_blocking(state, call, False, None) File "/usr/local/lib/python3.7/site-packages/grpc/_channel.py",
line 729, in _end_unary_response_blocking raise _InactiveRpcError(state)grpc._channel._InactiveRpcError:
<_InactiveRpcError of RPC that terminated with: status = StatusCode.ALREADY_EXISTS details = "Row [0000085c-0fca-5c04-a538-3d44e4ec9d23]
in table TestItems already exists" debug_error_string = "{"created":"@1612247321.800986805","description":"Error
received from peer ipv4:XXX.XXX.XX.XX:XXX","file":"src/core/lib/surface/call.cc","file_line":1055,"grpc_message":"Row
[0000085c-0fca-5c04-a538-3d44e4ec9d23] in table TestItems already exists","grpc_status":6}">
The above
exception was the direct cause of the following exception:
Traceback (most recent call last): File
"apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File "apache_beam/runners/common.py",
line 588, in apache_beam.runners.common.SimpleInvoker.invoke_process File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/experimental/spannerio.py",
line 1098, in process batch_func(**m.kwargs) File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_v1/database.py",
line 476, in __exit__ self._batch.commit() File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_v1/batch.py",
line 154, in commit metadata=metadata, File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_v1/gapic/spanner_client.py",
line 1556, in commit request, retry=retry, timeout=timeout, metadata=metadata File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py",
line 145, in __call__ return wrapped_func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func on_error=on_error, File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 184, in retry_target return target() File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py",
line 214, in func_with_timeout return func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 59, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "<string>",
line 3, in raise_fromgoogle.api_core.exceptions.AlreadyExists: 409 Row [0000085c-0fca-5c04-a538-3d44e4ec9d23]
in table TestItems already exists
During handling of the above exception, another exception occurred:
Traceback
(most recent call last): File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py",
line 649, in do_work work_executor.execute() File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py",
line 179, in execute op.start() File "dataflow_worker/shuffle_operations.py", line 63, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
File "dataflow_worker/shuffle_operations.py", line 64, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
File "dataflow_worker/shuffle_operations.py", line 79, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
File "dataflow_worker/shuffle_operations.py", line 80, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
File "dataflow_worker/shuffle_operations.py", line 84, in dataflow_worker.shuffle_operations.GroupedShuffleReadOperation.start
File "apache_beam/runners/worker/operations.py", line 359, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "dataflow_worker/shuffle_operations.py", line 261, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process
File "dataflow_worker/shuffle_operations.py", line 268, in dataflow_worker.shuffle_operations.BatchGroupAlsoByWindowsOperation.process
File "apache_beam/runners/worker/operations.py", line 359, in apache_beam.runners.worker.operations.Operation.output
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 587, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 587, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 768, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 891, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 158, in apache_beam.runners.worker.operations.ConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 768, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 886, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 587, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 587, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1306, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 587, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "apache_beam/runners/common.py", line 1401, in apache_beam.runners.common._OutputProcessor.process_outputs
File "apache_beam/runners/worker/operations.py", line 221, in apache_beam.runners.worker.operations.SingletonConsumerSet.receive
File "apache_beam/runners/worker/operations.py", line 718, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/worker/operations.py", line 719, in apache_beam.runners.worker.operations.DoOperation.process
File "apache_beam/runners/common.py", line 1241, in apache_beam.runners.common.DoFnRunner.process File
"apache_beam/runners/common.py", line 1321, in apache_beam.runners.common.DoFnRunner._reraise_augmented
File "/usr/local/lib/python3.7/site-packages/future/utils/__init__.py", line 446, in raise_with_traceback
raise exc.with_traceback(traceback) File "apache_beam/runners/common.py", line 1239, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 588, in apache_beam.runners.common.SimpleInvoker.invoke_process
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/gcp/experimental/spannerio.py", line 1098,
in process batch_func(**m.kwargs) File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_v1/database.py",
line 476, in __exit__ self._batch.commit() File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_v1/batch.py",
line 154, in commit metadata=metadata, File "/usr/local/lib/python3.7/site-packages/google/cloud/spanner_v1/gapic/spanner_client.py",
line 1556, in commit request, retry=retry, timeout=timeout, metadata=metadata File "/usr/local/lib/python3.7/site-packages/google/api_core/gapic_v1/method.py",
line 145, in __call__ return wrapped_func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 286, in retry_wrapped_func on_error=on_error, File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py",
line 184, in retry_target return target() File "/usr/local/lib/python3.7/site-packages/google/api_core/timeout.py",
line 214, in func_with_timeout return func(*args, **kwargs) File "/usr/local/lib/python3.7/site-packages/google/api_core/grpc_helpers.py",
line 59, in error_remapped_callable six.raise_from(exceptions.from_grpc_error(exc), exc) File "<string>",
line 3, in raise_fromgoogle.api_core.exceptions.AlreadyExists: 409 Row [0000085c-0fca-5c04-a538-3d44e4ec9d23]
in table TestItems already exists [while running 'Write Mutations to destination Spanner/Writing to
spanner']
```
Imported from Jira [BEAM-11741](https://issues.apache.org/jira/browse/BEAM-11741). Original Jira may contain additional context.
Reported by: bitnahian.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] github-actions[bot] closed issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
URL: https://github.com/apache/beam/issues/20717
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] svetakvsundhar commented on issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #20717:
URL: https://github.com/apache/beam/issues/20717#issuecomment-1532178754
.close-issue
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] svetakvsundhar commented on issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #20717:
URL: https://github.com/apache/beam/issues/20717#issuecomment-1530025504
I see the error is coming from https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/experimental/spannerio.py#L1260.
However, it seems like the error is handled inside the `process` function, in the `except` block. Is the description here proposing to add logic to understand if the primary key is already inserted in Spanner, before the `insert` operation [here](https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/experimental/spannerio.py#L1255)?
cc: @ahmedabu98 @johnjcasey
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] svetakvsundhar commented on issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #20717:
URL: https://github.com/apache/beam/issues/20717#issuecomment-1529983941
Any clarity on which `process` function the description of this issue is referring to? I'm assuming the reference is to somewhere in here: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/spanner.py#L272?
cc: @ahmedabu98 @johnjcasey
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] svetakvsundhar commented on issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #20717:
URL: https://github.com/apache/beam/issues/20717#issuecomment-1531899273
Also @damccorm to add issue context.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] svetakvsundhar commented on issue #20717: apache_beam.io.gcp.experimental.spannerio.WriteToSpanner fails on duplicate primary key
Posted by "svetakvsundhar (via GitHub)" <gi...@apache.org>.
svetakvsundhar commented on issue #20717:
URL: https://github.com/apache/beam/issues/20717#issuecomment-1532178646
Going to close this issue for the reason mentioned in the first comment. Feel free to reopen this if I have misunderstood.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org