You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/03 22:50:47 UTC

[GitHub] [beam] kennknowles opened a new issue, #19156: Running ParDo in loop with DirectRunners raises RuntimeException

kennknowles opened a new issue, #19156:
URL: https://github.com/apache/beam/issues/19156

   The Python [load test of ParDo operation for SyntheticSources](https://github.com/apache/beam/blob/faff82860c66e4050f0cfa5e874ffe6035ed0c1c/sdks/python/apache_beam/testing/load_tests/par_do_test.py#L133) that I created contains parametrized loop of ParDo with no operation inside besides metrics (this issue). With setting the number of iterations to \>~200 and running the test on DirectRunner I was encountering test failures. The test outputs whole (really long) pipeline logs. Some test runs raised the following exception:
   
    
   ```
   
   Traceback (most recent call last):
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/load_tests/par_do_test.py",
   line 144, in testParDo
   
       result = p.run()
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/testing/test_pipeline.py",
   line 104, in run
   
       result = super(TestPipeline, self).run(test_runner_api)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py",
   line 403, in run
   
       self.to_runner_api(), self.runner, self._options).run(False)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/pipeline.py",
   line 416, in run
   
       return self.runner.run_pipeline(self)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/direct/direct_runner.py",
   line 139, in run_pipeline
   
       return runner.run_pipeline(pipeline)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
   line 229, in run_pipeline
   
       return self.run_via_runner_api(pipeline.to_runner_api())
   
     File
   "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 232, in
   run_via_runner_api
   
       return self.run_stages(*self.create_stages(pipeline_proto))
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
   line 1015, in run_stages
   
       pcoll_buffers, safe_coders).process_bundle.metrics
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py",
   line 1132, in run_stage
   
       self._progress_frequency).process_bundle(data_input, data_output)
   
    
   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1388,
   in process_bundle
   
       result_future = self._controller.control_handler.push(process_bundle)
   
    
   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/portability/fn_api_runner.py", line 1260,
   in push
   
       response = self.worker.do_instruction(request)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
   line 212, in do_instruction
   
       request.instruction_id)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/sdk_worker.py",
   line 231, in process_bundle
   
       self.data_channel_factory)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
   line 343, in __init__
   
       self.ops = self.create_execution_tree(self.process_bundle_descriptor)
   
    
   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 385,
   in create_execution_tree
   
       descriptor.transforms, key=topological_height, reverse=True)])
   
    
   File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py", line 320,
   in wrapper
   
       result = cache[args] = func(*args)
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
   line 368, in get_operation
   
       in descriptor.transforms[transform_id].outputs.items()
   
     File "/Users/kasia/Repos/beam/sdks/python/apache_beam/runners/worker/bundle_processor.py",
   line 367, in <dictcomp>
   
       for tag, pcoll_id
   
   ... (3 last lines repeated for long period)
   
    
   RuntimeError:
   maximum recursion depth exceeded
   
   ```
   
    
   
    
   
   From my observation, I can say the problem appeared with various iteration number depending on computer resources. On my weaker computer started failing on ~150 iterations. The test succeeds on DataFlow with 1000 iterations (I didn't check higher number).
   
   I provide whole test output in Attachements.
   
   Imported from Jira [BEAM-5777](https://issues.apache.org/jira/browse/BEAM-5777). Original Jira may contain additional context.
   Reported by: kasiak.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org