You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Ilya Kasnacheev (Jira)" <ji...@apache.org> on 2021/05/12 11:26:00 UTC

[jira] [Created] (IGNITE-14711) Client discovery thread interrupt/stop causes endless communication reconnect attempt

Ilya Kasnacheev created IGNITE-14711:
----------------------------------------

             Summary: Client discovery thread interrupt/stop causes endless communication reconnect attempt
                 Key: IGNITE-14711
                 URL: https://issues.apache.org/jira/browse/IGNITE-14711
             Project: Ignite
          Issue Type: Bug
          Components: networking
    Affects Versions: 2.10
            Reporter: Ilya Kasnacheev


Original issue: if tcp-client-disco-sock-reader thread dies on client node, it will never disconnect from the cluster despite NODE_FAILED, and will endlessly try to open communication connections to server while getting "Remote node does not observe current node in topology" exceptions on client and "Close incoming connection, unknown node" on server.

Generalized issue: stop()ing or interrupt()ing discovery threads cause cluster to hang in many cases, where it is expected that any such node will:
* Restart the thread and continue normally
* Disconnect from the cluster to re-establish discovery connection
* Stop and close all remaining threads.

See the attached reproducer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)