You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 22:39:41 UTC

[GitHub] [beam] damccorm opened a new issue, #21316: ZeroDivisionError if source bundle smaller than 1mb

damccorm opened a new issue, #21316:
URL: https://github.com/apache/beam/issues/21316

   Hi,
   I built a (GCP) DataFlow **** apache-beam (version 2.24.0) pipeline, using python.
   The pipeline's stages are:
   1. reading from BigQuery
   2. running a custom function (using the ParDo while passing it a class inheriting 'beam.DoFn'), that transforms the read data.
   3. writing the transformed data to BigQuery
    
   The pipeline works fine when stage 1 is querying a small amount of data, but when it is querying data from the last six months (lots of data), I am getting this error:
    
   ```
   
   apache_beam.runners.dataflow.dataflow_runner.DataflowRuntimeException: Dataflow pipeline failed. State:
   FAILED, Error:
   Traceback (most recent call last):
     File "/usr/local/lib/python3.7/site-packages/dataflow_worker/batchworker.py",
   line 649, in do_work
       work_executor.execute()
     File "/usr/local/lib/python3.7/site-packages/dataflow_worker/executor.py",
   line 179, in execute
       op.start()
     File "dataflow_worker/native_operations.py", line 38, in dataflow_worker.native_operations.NativeReadOperation.start
    
   File "dataflow_worker/native_operations.py", line 39, in dataflow_worker.native_operations.NativeReadOperation.start
    
   File "dataflow_worker/native_operations.py", line 44, in dataflow_worker.native_operations.NativeReadOperation.start
    
   File "dataflow_worker/native_operations.py", line 48, in dataflow_worker.native_operations.NativeReadOperation.start
    
   File "/usr/local/lib/python3.7/site-packages/dataflow_worker/workercustomsources.py", line 69, in _{_}iter{_}_
    
     self._source.start_position, self._source.stop_position)
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py",
   line 78, in get_range_tracker
       start_position, stop_position, self._source_bundles)
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py",
   line 131, in _{_}init{_}_
       self._compute_cumulative_weights(source_bundles[start[0]:last]) + [1]
   *
     File "/usr/local/lib/python3.7/site-packages/apache_beam/io/concat_source.py", line 154, in _compute_cumulative_weights
    
     running_total.append(max(min_diff, min(1, running_total[-1] + w / total)))
   ZeroDivisionError: float
   division by zero
   
   ```
   
   
    
   I saw this issue in your repository:
    
   https://issues.apache.org/jira/browse/BEAM-10004
    
   which is referencing the same problem I have, but even though this fix is already implemented in my version of apache-beam (2.24.0), I still get this error.
    
   Can you please guide me in fixing this issue?
   Is it something I am doing wrong or is this your bug?
    
   Thank you in advance and have a good one!
    
   Inbar Dekel
   
   Imported from Jira [BEAM-13721](https://issues.apache.org/jira/browse/BEAM-13721). Original Jira may contain additional context.
   Reported by: inbar.dekel.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org