You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Hale Bales (Jira)" <ji...@apache.org> on 2022/05/10 21:54:00 UTC

[jira] [Commented] (GEODE-9906) Unable to reconnect a node after SO patching "15 seconds have elapsed while waiting for replies"

    [ https://issues.apache.org/jira/browse/GEODE-9906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17534595#comment-17534595 ] 

Hale Bales commented on GEODE-9906:
-----------------------------------

Hi [~mb1977], thank you for raising this issue. Given that it is on such an old version of Geode, I am going to close this ticket. If you can, please try to reproduce this issue on develop or the most recent release and reopen the issue if it is still a problem. Please reach out on the dev list if you have issues using the most recent version. Thanks!

> Unable to reconnect a node after SO patching "15 seconds have elapsed while waiting for replies"
> ------------------------------------------------------------------------------------------------
>
>                 Key: GEODE-9906
>                 URL: https://issues.apache.org/jira/browse/GEODE-9906
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Marco Baldessari
>            Priority: Major
>
> I have a cluster situation consisting of 4 total nodes, 3 servers and 1 management node, working properly.
> At the beginning of the month we planned to patch the OS and we started from the first server node with this procedure:
> - Stop service
> - S.O. patching
> - Server restart
> - Start service
> The service of the first patched node named "serverA" fails to restart with this error:
> Log entries cluster join:
> serverA:
> | INFO  | region-dm-12                 | ache.geode.internal.tcp.Connection | --> Connection: shared=true ordered=false failed to connect to peer 10.237.110.195( Server serverB:9993)<ec><v127>:1024 because: java.net.ConnectException: Connection timed out (Connection timed out)
> | WARN  | region-dm-12               | ache.geode.internal.tcp.Connection | --> Connection: Attempting reconnect to peer  10.237.110.195( Server serverB:9993)<ec><v127>:1024
>  
> ServerMgmt:
> | WARN  | pool-3-thread-1              | tributed.internal.ReplyProcessor21     | --> 15 seconds have elapsed while waiting for replies: <CreateRegionProcessor$CreateRegionReplyProcessor 44180 waiting for 1 replies from [10.237.110.194( Server serverA:632)<ec><v174>:1024]> on 10.237.110.225( Management:6033)<ec><v111>:1024 whose current membership list is: [[10.237.110.196( Server serverC:16805)<ec><v136>:1024, 10.237.110.225( Management:6033)<ec><v111>:1024, 10.237.110.195( Server serverB:9993)<ec><v127>:1024, 10.237.110.194( Server serverA:632)<ec><v174>:1024]]
>  
> The connection between the systems was verified with tcpdumps, udp 1024 is running fine.
>  
> We have tried redeploying the service and making numerous attempts but we always get the same error during startup.
> Any idea? Thank you.
> Marco.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)