You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladimir Steshin (Jira)" <ji...@apache.org> on 2020/05/15 13:35:00 UTC

[jira] [Updated] (IGNITE-13016) Fix backward checking of failed node.

     [ https://issues.apache.org/jira/browse/IGNITE-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vladimir Steshin updated IGNITE-13016:
--------------------------------------
    Description: 
We should fix 3 drawbacks in the backward checking of failed node:

1) It has hardcoded timeout 100ms:
{code:java}
        private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
            try (Socket sock = new Socket()) {
                sock.connect(addr, 100);
            }
            catch (ConnectException e) {
                return true;
            }
            catch (IOException e) {
                return false;
            }

            return false;
        }
{code}

2) Maximal interval to check previous node in the ring is:

{code:java}
TcpDiscoveryHandshakeResponse res =
                        new TcpDiscoveryHandshakeResponse(...);

...

                        // We got message from previous in less than double connection check interval.
                        boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now;

if (ok) {
                            // Check case when previous node suddenly died. This will speed up
                            // node failing.
                            ...

}

                res.previousNodeAlive(ok);
{code}



  was:
Backward checking of failed node rely on hardcoced timeout 100ms:

{code:java}
        private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
            try (Socket sock = new Socket()) {
                sock.connect(addr, 100);
            }
            catch (ConnectException e) {
                return true;
            }
            catch (IOException e) {
                return false;
            }

            return false;
        }
{code}

We should make it bound to configurable params like IgniteConfiguration.failureDetectionTimeout.

Also, the maximal interval to chech previous node is 


        Summary: Fix backward checking of failed node.  (was: Remove hardcoded values/timeouts from backward checking of failed node.)

> Fix backward checking of failed node.
> -------------------------------------
>
>                 Key: IGNITE-13016
>                 URL: https://issues.apache.org/jira/browse/IGNITE-13016
>             Project: Ignite
>          Issue Type: Sub-task
>            Reporter: Vladimir Steshin
>            Assignee: Vladimir Steshin
>            Priority: Major
>              Labels: iep-45
>
> We should fix 3 drawbacks in the backward checking of failed node:
> 1) It has hardcoded timeout 100ms:
> {code:java}
>         private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
>             try (Socket sock = new Socket()) {
>                 sock.connect(addr, 100);
>             }
>             catch (ConnectException e) {
>                 return true;
>             }
>             catch (IOException e) {
>                 return false;
>             }
>             return false;
>         }
> {code}
> 2) Maximal interval to check previous node in the ring is:
> {code:java}
> TcpDiscoveryHandshakeResponse res =
>                         new TcpDiscoveryHandshakeResponse(...);
> ...
>                         // We got message from previous in less than double connection check interval.
>                         boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now;
> if (ok) {
>                             // Check case when previous node suddenly died. This will speed up
>                             // node failing.
>                             ...
> }
>                 res.previousNodeAlive(ok);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)