You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/08/09 13:47:14 UTC

[GitHub] [beam] yeandy opened a new issue, #22643: [Bug]: grpc connection issue

yeandy opened a new issue, #22643:
URL: https://github.com/apache/beam/issues/22643

   ### What happened?
   
   Seeing this error in Python [PostCommits](https://ci-beam.apache.org/job/beam_PostCommit_Python37_PR/411/console) 
   ```
   21:36:07 Exception in thread read_grpc_client_inputs:
   21:36:07 Traceback (most recent call last):
   21:36:07   File "/usr/lib/python3.7/threading.py", line 926, in _bootstrap_inner
   21:36:07     self.run()
   21:36:07   File "/usr/lib/python3.7/threading.py", line 870, in run
   21:36:07     self._target(*self._args, **self._kwargs)
   21:36:07   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/data_plane.py", line 671, in <lambda>
   21:36:07     target=lambda: self._read_inputs(elements_iterator),
   21:36:07   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/sdks/python/apache_beam/runners/worker/data_plane.py", line 654, in _read_inputs
   21:36:07     for elements in elements_iterator:
   21:36:07   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib/python3.7/site-packages/grpc/_channel.py", line 426, in __next__
   21:36:07     return self._next()
   21:36:07   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python37_PR/src/build/gradleenv/2022703441/lib/python3.7/site-packages/grpc/_channel.py", line 826, in _next
   21:36:07     raise self
   21:36:07 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
   21:36:07 	status = StatusCode.CANCELLED
   21:36:07 	details = "Multiplexer hanging up"
   21:36:07 	debug_error_string = "{"created":"@1660012354.173326998","description":"Error received from peer ipv6:[::1]:37919","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Multiplexer hanging up","grpc_status":1}"
   ```
   
   Or like this
   ```
   15:30:23 Exception in thread read_state:
   15:30:24 Traceback (most recent call last):
   15:30:24   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
   15:30:24 Exception in thread run_worker_1-1:
   15:30:24 Traceback (most recent call last):
   15:30:24   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
   15:30:24 ERROR:apache_beam.runners.worker.data_plane:Failed to read inputs in the data plane.
   15:30:24 Traceback (most recent call last):
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/sdks/python/apache_beam/runners/worker/data_plane.py", line 654, in _read_inputs
   15:30:24     for elements in elements_iterator:
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
   15:30:24     return self._next()
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
   15:30:24     raise self
   15:30:24 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
   15:30:24 	status = StatusCode.UNAVAILABLE
   15:30:24 	details = "Socket closed"
   15:30:24 	debug_error_string = "{"created":"@1659990624.371994304","description":"Error received from peer ipv6:[::1]:42753","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Socket closed","grpc_status":14}"
   15:30:24 >
   15:30:24 Exception in thread read_grpc_client_inputs:
   15:30:24 Traceback (most recent call last):
   15:30:24   File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
   15:30:24     self.run()
   15:30:24   File "/usr/lib/python3.8/threading.py", line 870, in run
   15:30:24     self.run()
   15:30:24   File "/usr/lib/python3.8/threading.py", line 870, in run
   15:30:24     self.run()
   15:30:24   File "/usr/lib/python3.8/threading.py", line 870, in run
   15:30:24     self._target(*self._args, **self._kwargs)
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 251, in run
   15:30:24     self._target(*self._args, **self._kwargs)
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/sdks/python/apache_beam/runners/worker/sdk_worker.py", line 1008, in pull_responses
   15:30:24     self._target(*self._args, **self._kwargs)
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/sdks/python/apache_beam/runners/worker/data_plane.py", line 671, in <lambda>
   15:30:24     for work_request in self._control_stub.Control(get_responses()):
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
   15:30:24     for response in responses:
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
   15:30:24     target=lambda: self._read_inputs(elements_iterator),
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/sdks/python/apache_beam/runners/worker/data_plane.py", line 654, in _read_inputs
   15:30:24     return self._next()
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
   15:30:24     return self._next()
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 826, in _next
   15:30:24     raise self
   15:30:24 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
   15:30:24 	status = StatusCode.UNAVAILABLE
   15:30:24 	details = "Connection reset by peer"
   15:30:24 	debug_error_string = "{"created":"@1659990624.372101270","description":"Error received from peer ipv6:[::1]:40577","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Connection reset by peer","grpc_status":14}"
   15:30:24 >
   15:30:24     for elements in elements_iterator:
   15:30:24   File "/home/jenkins/jenkins-slave/workspace/beam_PostCommit_Python38/src/build/gradleenv/2022703442/lib/python3.8/site-packages/grpc/_channel.py", line 426, in __next__
   15:30:24     raise self
   15:30:24 grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
   15:30:24 	status = StatusCode.UNAVAILABLE
   15:30:24 	details = "Socket closed"
   15:30:24 	debug_error_string = "{"created":"@1659990624.372044384","description":"Error received from peer ipv6:[::1]:43851","file":"src/core/lib/surface/call.cc","file_line":966,"grpc_message":"Socket closed","grpc_status":14}"
   ```
   
   ### Issue Priority
   
   Priority: 2
   
   ### Issue Component
   
   Component: test-failures


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on issue #22643: [Bug]: grpc connection issue

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on issue #22643:
URL: https://github.com/apache/beam/issues/22643#issuecomment-1209566366

   I'm not sure that's the root cause of the timeout, I've seen errors like that before, they typically are the root cause of an actual test failure. In this case it looks like we've been flirting with the three hour timeout for a while, e.g. this run from last week took 2h55m: https://ci-beam.apache.org/job/beam_PostCommit_Python37/5552/
   
   I'd suggest we first double the timeout for these PostCommits. Then we can come back and try to root cause the increase in runtime.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #22643: [Bug]: grpc connection issue

Posted by GitBox <gi...@apache.org>.
Abacn commented on issue #22643:
URL: https://github.com/apache/beam/issues/22643#issuecomment-1211022848

   As tests taking longer and longer, we may also want to split some IO tests from the generic postcommit as the ongoing work in java...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] yeandy commented on issue #22643: [Bug]: grpc connection issue

Posted by GitBox <gi...@apache.org>.
yeandy commented on issue #22643:
URL: https://github.com/apache/beam/issues/22643#issuecomment-1209582674

   Yeah I've seen these before tool. That makes sense. Do you mean to double timeout for `ExerciseMetricsPipelineTest` tests, and not all Python [postcommits](https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PostCommit_Python.groovy#L33)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on issue #22643: [Bug]: grpc connection issue

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on issue #22643:
URL: https://github.com/apache/beam/issues/22643#issuecomment-1211231228

   Yeah I agree that we should do something to address the long-running PostCommits after this is resolved. We've had some discussions about splitting up the PostCommit like Java is doing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] TheNeuralBit commented on issue #22643: [Bug]: grpc connection issue

Posted by GitBox <gi...@apache.org>.
TheNeuralBit commented on issue #22643:
URL: https://github.com/apache/beam/issues/22643#issuecomment-1209594089

   No I mean for all Python PostCommits (your link is correct), since that's the timeout we're hitting. Per @kennknowles - timeouts are for catching runaway jobs, not for punishing us when the jobs start taking too long. The timeout shouldn't be that close to the actual runtime of the job.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org