You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Korotkov (Jira)" <ji...@apache.org> on 2022/04/13 08:57:00 UTC

[jira] [Created] (IGNITE-16843) Timeout while thin client connection

Sergey Korotkov created IGNITE-16843:
----------------------------------------

             Summary: Timeout while thin client connection
                 Key: IGNITE-16843
                 URL: https://issues.apache.org/jira/browse/IGNITE-16843
             Project: Ignite
          Issue Type: Bug
            Reporter: Sergey Korotkov


In usecases with several active thin clients producing noticable load to cluster new thin clients can fail to connect with the *"ClientConnectionException: Channel is closed"* error in the *TcpClientChannel::handshake()* method.

On server side warning *"Unable to perform handshake within timeout [timeout=10000"* is logged.

The problem can be easily reproduced by several large putAlls invoked in parallel from several or single thin client.  Espesially for the TRANSACTIONAL caches.  But ATOMIC caches are also affected - the only difference is that for ATOMIC caches more parallelizm factor and larger batches for putAlls are needed.
----
The reason of the problem is a fact that a single queue is used in the ignite node to serve all thin client related requests (queue in the {_}GridThinClientExecutor{_}). Both working requests like _putAll_ and control ones like _handshake_ which are used for connection establishment. 

Working requests can live indefenitely in a queue (or at least longer then {_}handshake{_}).  On the other hand a special watchdog timer is scheduled in the ignite node to check if a _handshake_ request is processed in a timely manner ({_}ClientListenerNioListener::scheduleHandshakeTimeout{_}).  By default the 10 seconds timeout is used ({_}ClientConnectorConfiguration::handshakeTimeout{_}). If timeout expires the client session is closed forcibly.

So, if one or several thin clients fill queue with long operations new clients can not connect. 
----
The real usecase reveals the problem is as follows.
 * 4 nodes cluster, 64 cpu, 32Gb heap, 512Gb off-heap  each
 * One TRANSACTIONAL cache with backups=1
 * About 30Gb of data on each node
 * Several (upto 70-100 at the same time) thin clients loading data using putAlls in 50000 records batches. Client connects, loads 3 batches and disconnects (spark jobs in fact).
 * Default handshakeTimeout (10 secs) and threadPoolSize(8) in ClientConnectorConfiguration

----
Two ducktests were created to reproduce and isolate  the problem.

*{color:#FF0000}Note that asserts in testcases done in a way that test PASS means that problem IS reproduced.{color}*  Tests check that at least one thin client fails with the "Channel is closed" error and that server node log contains warning about the handshake timeout.

*ThinClientConcurrencyTest::test_many_thin_clients*

Mimics the above real life usecase. Several thin client processes invoke putAlls in several threads. There are two sets of parameters - one for TRANSACTIONAL and one for ATOMIC cache.

*ThinClientConcurrencyTest::test_one_greedy_thin_client*

Minimal test shows that a single thin client can produce such a load that another one can not connect.

***

The following full stack trace is logged on the client side:
{noformat}
org.apache.ignite.client.ClientConnectionException: Channel is closed
    at org.apache.ignite.internal.client.thin.TcpClientChannel.handshake(TcpClientChannel.java:595) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.TcpClientChannel.<init>(TcpClientChannel.java:180) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.ReliableChannel$ClientChannelHolder.getOrCreateChannel(ReliableChannel.java:917) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.ReliableChannel$ClientChannelHolder.getOrCreateChannel(ReliableChannel.java:898) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.ReliableChannel$ClientChannelHolder.access$200(ReliableChannel.java:847) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.ReliableChannel.applyOnDefaultChannel(ReliableChannel.java:759) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.ReliableChannel.applyOnDefaultChannel(ReliableChannel.java:731) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.ReliableChannel.channelsInit(ReliableChannel.java:702) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.TcpIgniteClient.<init>(TcpIgniteClient.java:126) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.TcpIgniteClient.<init>(TcpIgniteClient.java:102) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.TcpIgniteClient.start(TcpIgniteClient.java:339) ~[classes/:?]
    at org.apache.ignite.Ignition.startClient(Ignition.java:615) ~[classes/:?]
    at org.apache.ignite.internal.ducktest.tests.thin_client_test.ThinClientDataGenerationApplication$PutJob.getClient(ThinClientDataGenerationApplication.java:181) ~[ignite-ducktests-2.
14.0-SNAPSHOT.jar:2.14.0-SNAPSHOT]
    at org.apache.ignite.internal.ducktest.tests.thin_client_test.ThinClientDataGenerationApplication$PutJob.call(ThinClientDataGenerationApplication.java:138) [ignite-ducktests-2.14.0-S
NAPSHOT.jar:2.14.0-SNAPSHOT]
    at org.apache.ignite.internal.ducktest.tests.thin_client_test.ThinClientDataGenerationApplication$PutJob.call(ThinClientDataGenerationApplication.java:111) [ignite-ducktests-2.14.0-S
NAPSHOT.jar:2.14.0-SNAPSHOT]
    at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
Caused by: org.apache.ignite.IgniteCheckedException: Channel is closed
    at org.apache.ignite.internal.util.IgniteUtils.cast(IgniteUtils.java:7863) ~[classes/:?]
    at org.apache.ignite.internal.util.future.GridFutureAdapter.resolve(GridFutureAdapter.java:260) ~[classes/:?]
    at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:191) ~[classes/:?]
    at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:141) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.TcpClientChannel.handshake(TcpClientChannel.java:591) ~[classes/:?]
    ... 18 more
Caused by: org.apache.ignite.client.ClientConnectionException: Channel is closed
    at org.apache.ignite.internal.client.thin.TcpClientChannel.close(TcpClientChannel.java:213) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.TcpClientChannel.onDisconnected(TcpClientChannel.java:199) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.io.gridnioserver.GridNioClientConnection.onDisconnected(GridNioClientConnection.java:95) ~[classes/:?]
    at org.apache.ignite.internal.client.thin.io.gridnioserver.GridNioClientListener.onDisconnected(GridNioClientListener.java:42) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioFilterChain$TailFilter.onSessionClosed(GridNioFilterChain.java:256) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:95) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioCodecFilter.onSessionClosed(GridNioCodecFilter.java:71) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioFilterAdapter.proceedSessionClosed(GridNioFilterAdapter.java:95) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$HeadFilter.onSessionClosed(GridNioServer.java:3690) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioFilterChain.onSessionClosed(GridNioFilterChain.java:149) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2877) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.close(GridNioServer.java:2794) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$ByteBufferNioClientWorker.processRead(GridNioServer.java:1189) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.processSelectedKeysOptimized(GridNioServer.java:2508) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.bodyInternal(GridNioServer.java:2273) ~[classes/:?]
    at org.apache.ignite.internal.util.nio.GridNioServer$AbstractNioClientWorker.body(GridNioServer.java:1910) ~[classes/:?]
    at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:125) ~[classes/:?]
    ... 1 more
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)