You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "Kirk Lund (Jira)" <ji...@apache.org> on 2021/04/06 17:19:00 UTC

[jira] [Commented] (GEODE-9024) Geode Cache Server stops accepting client connections

    [ https://issues.apache.org/jira/browse/GEODE-9024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17315729#comment-17315729 ] 

Kirk Lund commented on GEODE-9024:
----------------------------------

[~leonfin] Hi Leon, I recommend asking detailed questions about this on the geode dev-list: dev@geode.apache.org.

> Geode Cache Server stops accepting client connections
> -----------------------------------------------------
>
>                 Key: GEODE-9024
>                 URL: https://issues.apache.org/jira/browse/GEODE-9024
>             Project: Geode
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.13.1
>            Reporter: Leon Finker
>            Priority: Critical
>
> We are encountering the following deadlock (pretty often) on 1.13.1:
> 1. Client (bridge) acceptor thread is locked up in this stack
> {noformat}
> "Handshaker 0.0.0.0/0.0.0.0:40011 Thread 2" #219 daemon prio=5
> os_prio=0 tid=0x00007f755c007000 nid=0x44a2 runnable
> [0x00007f75847c7000]
>  java.lang.Thread.State: RUNNABLE
>  at java.net.SocketInputStream.socketRead0(Native Method)
>  at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>  at java.net.SocketInputStream.read(SocketInputStream.java:170)
>  at java.net.SocketInputStream.read(SocketInputStream.java:141)
>  at java.net.SocketInputStream.read(SocketInputStream.java:223)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.getCommunicationModeForNonSelector(AcceptorImpl.java:1559)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.handleNewClientConnection(AcceptorImpl.java:1430)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$handOffNewClientConnection$4(AcceptorImpl.java:1341)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$407/2146094985.run(Unknown
> Source)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> 2. The 4 Handshaker threads for that pool are stuck in this stack
> {noformat}
> "Handshaker 0.0.0.0/0.0.0.0:40011 Thread 2" #219 daemon prio=5
> os_prio=0 tid=0x00007f755c007000 nid=0x44a2 runnable
> [0x00007f75847c7000]
>  java.lang.Thread.State: RUNNABLE
>  at java.net.SocketInputStream.socketRead0(Native Method)
>  at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
>  at java.net.SocketInputStream.read(SocketInputStream.java:170)
>  at java.net.SocketInputStream.read(SocketInputStream.java:141)
>  at java.net.SocketInputStream.read(SocketInputStream.java:223)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.getCommunicationModeForNonSelector(AcceptorImpl.java:1559)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.handleNewClientConnection(AcceptorImpl.java:1430)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl.lambda$handOffNewClientConnection$4(AcceptorImpl.java:1341)
>  at
> org.apache.geode.internal.cache.tier.sockets.AcceptorImpl$$Lambda$407/2146094985.run(Unknown
> Source)
>  at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)
> {noformat}
> Is there any reason there is no socket read timeout set here:
> private CommunicationMode getCommunicationModeForNonSelector(Socket
> socket) throws IOException {
>  socket.setSoTimeout(0);
>  socketCreator.forCluster().handshakeIfSocketIsSSL(socket, acceptTimeout);
>  byte communicationModeByte = (byte) socket.getInputStream().read();
> This blocks any new client connections to the server. Why not set read
> timeout? For some reason it's explicitly set to 0 (infinite)...This seems to have changed here:
> https://github.com/apache/geode/commit/e423cd8fa24329baf11fd6871a5ea6dc0f362b6c
> Before that change, the socket.setSoTimeout(0); was after the socket read. 
> The cache server can be brought to a complete stop by just opening 4 telnet sessions to the cache server port. This is kind of denial of service...
> This is when using default CacheServer.MaxThreads=0. Maybe the work around is to use CacheServer.MaxThreads=N because then the code goes into a selector based logic with timeout it seems?
> Thank you



--
This message was sent by Atlassian Jira
(v8.3.4#803005)