You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Vladimir Steshin (Jira)" <ji...@apache.org> on 2020/05/15 13:35:00 UTC
[jira] [Updated] (IGNITE-13016) Fix backward checking of failed
node.
[ https://issues.apache.org/jira/browse/IGNITE-13016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Steshin updated IGNITE-13016:
--------------------------------------
Description:
We should fix 3 drawbacks in the backward checking of failed node:
1) It has hardcoded timeout 100ms:
{code:java}
private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
try (Socket sock = new Socket()) {
sock.connect(addr, 100);
}
catch (ConnectException e) {
return true;
}
catch (IOException e) {
return false;
}
return false;
}
{code}
2) Maximal interval to check previous node in the ring is:
{code:java}
TcpDiscoveryHandshakeResponse res =
new TcpDiscoveryHandshakeResponse(...);
...
// We got message from previous in less than double connection check interval.
boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now;
if (ok) {
// Check case when previous node suddenly died. This will speed up
// node failing.
...
}
res.previousNodeAlive(ok);
{code}
was:
Backward checking of failed node rely on hardcoced timeout 100ms:
{code:java}
private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
try (Socket sock = new Socket()) {
sock.connect(addr, 100);
}
catch (ConnectException e) {
return true;
}
catch (IOException e) {
return false;
}
return false;
}
{code}
We should make it bound to configurable params like IgniteConfiguration.failureDetectionTimeout.
Also, the maximal interval to chech previous node is
Summary: Fix backward checking of failed node. (was: Remove hardcoded values/timeouts from backward checking of failed node.)
> Fix backward checking of failed node.
> -------------------------------------
>
> Key: IGNITE-13016
> URL: https://issues.apache.org/jira/browse/IGNITE-13016
> Project: Ignite
> Issue Type: Sub-task
> Reporter: Vladimir Steshin
> Assignee: Vladimir Steshin
> Priority: Major
> Labels: iep-45
>
> We should fix 3 drawbacks in the backward checking of failed node:
> 1) It has hardcoded timeout 100ms:
> {code:java}
> private boolean ServerImpls.isConnectionRefused(SocketAddress addr) {
> try (Socket sock = new Socket()) {
> sock.connect(addr, 100);
> }
> catch (ConnectException e) {
> return true;
> }
> catch (IOException e) {
> return false;
> }
> return false;
> }
> {code}
> 2) Maximal interval to check previous node in the ring is:
> {code:java}
> TcpDiscoveryHandshakeResponse res =
> new TcpDiscoveryHandshakeResponse(...);
> ...
> // We got message from previous in less than double connection check interval.
> boolean ok = rcvdTime + CON_CHECK_INTERVAL * 2 >= now;
> if (ok) {
> // Check case when previous node suddenly died. This will speed up
> // node failing.
> ...
> }
> res.previousNodeAlive(ok);
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)