You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Janek Bevendorff (Jira)" <ji...@apache.org> on 2022/03/08 13:45:00 UTC
[jira] [Created] (BEAM-14070) Beam worker closing gRPC connection with many workers and large shuffle sizes
Janek Bevendorff created BEAM-14070:
---------------------------------------
Summary: Beam worker closing gRPC connection with many workers and large shuffle sizes
Key: BEAM-14070
URL: https://issues.apache.org/jira/browse/BEAM-14070
Project: Beam
Issue Type: Bug
Components: sdk-py-core
Affects Versions: 2.36.0
Reporter: Janek Bevendorff
When I run a job with many workers (100 or more) and large shuffle sizes (millions of records and/or several GB), my workers fail unexpectedly with
{code:java}
python -m apache_beam.runners.worker.sdk_worker_main
E0308 12:59:18.067442934 724 chttp2_transport.cc:1103] Received a GOAWAY with error code ENHANCE_YOUR_CALM and debug data equal to "too_many_pings"
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 264, in <module>
main(sys.argv)
File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker_main.py", line 155, in main
sdk_harness.run()
File "/usr/local/lib/python3.8/site-packages/apache_beam/runners/worker/sdk_worker.py", line 234, in run
for work_request in self._control_stub.Control(get_responses()):
File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
return self._next()
File "/usr/local/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "{"created":"@1646744358.118371750","description":"Error received from peer ipv6:[::1]:34305","file":"src/core/lib/surface/call.cc","file_line":1074,"grpc_message":"Socket
closed","grpc_status":14}"
>{code}
This is probably related to or even the same as BEAM-12448 or BEAM-6258, but since one of them is already marked as fixed in a previous version and both reports have a large tail of unreadable auto-generated comments, I decided to create a new issue.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)