You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Marco Baldessari (Jira)" <ji...@apache.org> on 2021/12/20 14:25:00 UTC

[jira] [Created] (GEODE-9906) Unable to reconnect a node after SO patching "15 seconds have elapsed while waiting for replies"

Marco Baldessari created GEODE-9906:
---------------------------------------

             Summary: Unable to reconnect a node after SO patching "15 seconds have elapsed while waiting for replies"
                 Key: GEODE-9906
                 URL: https://issues.apache.org/jira/browse/GEODE-9906
             Project: Geode
          Issue Type: Bug
            Reporter: Marco Baldessari


I have a cluster situation consisting of 4 total nodes, 3 servers and 1 management node, working properly.

At the beginning of the month we planned to patch the OS and we started from the first server node with this procedure:

- Stop service
- S.O. patching
- Server restart
- Start service

The service of the first patched node named "serverA" fails to restart with this error:

Log entries cluster join:
serverA:
| INFO  | region-dm-12                 | ache.geode.internal.tcp.Connection | --> Connection: shared=true ordered=false failed to connect to peer 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because: java.net.ConnectException: Connection timed out (Connection timed out)
| WARN  | region-dm-12               | ache.geode.internal.tcp.Connection | --> Connection: Attempting reconnect to peer  10.237.110.195( Server serverB:9993)<ec><v127>:1024
 
ServerMgmt:
| WARN  | pool-3-thread-1              | tributed.internal.ReplyProcessor21     | --> 15 seconds have elapsed while waiting for replies: <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225( Management:6033)<ec><v111>:1024 whose current membership list is: [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( Management:6033)<ec><v111>:1024, 10.237.110.195( Server serverB:9993)<ec><v127>:1024, 10.237.110.194( Server serverA:632)<ec><v174>:1024]]
 
The connection between the systems was verified with tcpdumps, udp 1024 is running fine.
 
We have tried redeploying the service and making numerous attempts but we always get the same error during startup.

Any idea? Thank you.

Marco.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)