You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 22:39:41 UTC
[GitHub] [beam] damccorm opened a new issue, #21316: ZeroDivisionError if source bundle smaller than 1mb
damccorm opened a new issue, #21316:
URL: https://github.com/apache/beam/issues/21316
Hi,
I built a (GCP) DataFlow **** apache-beam (version 2.24.0) pipeline, using python.
The pipeline's stages are:
1. reading from BigQuery
2. running a custom function (using the ParDo while passing it a class inheriting 'beam.DoFn'), that transforms the read data.
3. writing the transformed data to BigQuery
The pipeline works fine when stage 1 is querying a small amount of data, but when it is querying data from the last six months (lots of data), I am getting this error:
```
apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State:
FAILED, Error:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py",
line 649, in do_work
work_executor.execute()
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py",
line 179, in execute
op.start()
File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
File "/usr/local/lib/python3.7/site-packages/dataflow_worker/workercustomsources.py", line 69, in _{_}iter{_}_
self._source.start_position, self._source.stop_position)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py",
line 78, in get_range_tracker
start_position, stop_position, self._source_bundles)
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py",
line 131, in _{_}init{_}_
self._compute_cumulative_weights(source_bundles[start[0]:last]) + [1]
*
File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", line 154, in _compute_cumulative_weights
running_total.append(max(min_diff, min(1, running_total[-1] + w / total)))
ZeroDivisionError: float
division by zero
```
I saw this issue in your repository:
https://issues.apache.org/jira/browse/BEAM-10004
which is referencing the same problem I have, but even though this fix is already implemented in my version of apache-beam (2.24.0), I still get this error.
Can you please guide me in fixing this issue?
Is it something I am doing wrong or is this your bug?
Thank you in advance and have a good one!
Inbar Dekel
Imported from Jira [BEAM-13721](https://issues.apache.org/jira/browse/BEAM-13721). Original Jira may contain additional context.
Reported by: inbar.dekel.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org