You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Denis Magda <dm...@gridgain.com> on 2015/07/23 12:31:58 UTC

Finishing work on IGNITE-752 (Speed up failure detection)

Igniters,

During this week I've been working on an improvement that lets to detect 
failures at cluster nodes' discovery/communication/network levels as 
quick as possible and lets the user to tune such a behavior with a 
single configuration parameter.

Sure the failure detection exists for a long time in Ignite and the user 
is able to tune it BUT there are around *10* configuration parameters 
that have to be setup to achieve a desired result.

When IGNITE-752 is merged to the main development branch all this 
behavior will be possible to control with a single parameter - 
IgniteConfiguration.failureDetectionThreshold.

By setting the failure detection threshold for a server node it will be 
possible to detect failed nodes in a cluster topology during the time 
equal to threshold's value and switch to/keep working with only alive 
nodes.
By setting the threshold for a client node will let us to connection 
failures between the client and its router node (a server node that is a 
part of a topology).

In addition, bunch of other improvements and simplifications were done 
at the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are 
aggregated here:
https://issues.apache.org/jira/browse/IGNITE-752

General review is passed. However if anyone wants to review as well or 
have any thoughts/suggestions don't hesitate to propose them.

Dmitiry S, I would like to ask you to review documentation changes in 
any case before I do a merge.


Regards,
Denis


Re: (javadoc) Finishing work on IGNITE-752 (Speed up failure detection)

Posted by Dmitriy Setrakyan <ds...@apache.org>.
Denis,

I added my comments to the ticket:
https://issues.apache.org/jira/browse/IGNITE-752

D.

On Thu, Jul 23, 2015 at 9:11 PM, Denis Magda <dm...@gridgain.com> wrote:

> Igniters,
>
> Could someone review java doc changes in public classes/interfaces? It’s
> ok to have a look at the changes in the following places:
> - IgniteConfiguration (get/setFailureDetectionThreshold);
> - Header of TcpDiscoverySpi, TcpCommunicationSpi.
>
> —
> Denis
>
> > On 23 июля 2015 г., at 13:31, Denis Magda <dm...@gridgain.com> wrote:
> >
> > Igniters,
> >
> > During this week I've been working on an improvement that lets to detect
> failures at cluster nodes' discovery/communication/network levels as quick
> as possible and lets the user to tune such a behavior with a single
> configuration parameter.
> >
> > Sure the failure detection exists for a long time in Ignite and the user
> is able to tune it BUT there are around 10 configuration parameters that
> have to be setup to achieve a desired result.
> >
> > When IGNITE-752 is merged to the main development branch all this
> behavior will be possible to control with a single parameter -
> IgniteConfiguration.failureDetectionThreshold.
> >
> > By setting the failure detection threshold for a server node it will be
> possible to detect failed nodes in a cluster topology during the time equal
> to threshold's value and switch to/keep working with only alive nodes.
> > By setting the threshold for a client node will let us to connection
> failures between the client and its router node (a server node that is a
> part of a topology).
> >
> > In addition, bunch of other improvements and simplifications were done
> at the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are
> aggregated here:
> > https://issues.apache.org/jira/browse/IGNITE-752 <
> https://issues.apache.org/jira/browse/IGNITE-752>
> >
> > General review is passed. However if anyone wants to review as well or
> have any thoughts/suggestions don't hesitate to propose them.
> >
> > Dmitiry S, I would like to ask you to review documentation changes in
> any case before I do a merge.
> >
> >
> > Regards,
> > Denis
> >
>
>

Re: (javadoc) Finishing work on IGNITE-752 (Speed up failure detection)

Posted by Denis Magda <dm...@gridgain.com>.
Igniters,

Could someone review java doc changes in public classes/interfaces? It’s ok to have a look at the changes in the following places:
- IgniteConfiguration (get/setFailureDetectionThreshold);
- Header of TcpDiscoverySpi, TcpCommunicationSpi.

—
Denis

> On 23 июля 2015 г., at 13:31, Denis Magda <dm...@gridgain.com> wrote:
> 
> Igniters, 
> 
> During this week I've been working on an improvement that lets to detect failures at cluster nodes' discovery/communication/network levels as quick as possible and lets the user to tune such a behavior with a single configuration parameter.
> 
> Sure the failure detection exists for a long time in Ignite and the user is able to tune it BUT there are around 10 configuration parameters that have to be setup to achieve a desired result.
> 
> When IGNITE-752 is merged to the main development branch all this behavior will be possible to control with a single parameter - IgniteConfiguration.failureDetectionThreshold.
> 
> By setting the failure detection threshold for a server node it will be possible to detect failed nodes in a cluster topology during the time equal to threshold's value and switch to/keep working with only alive nodes. 
> By setting the threshold for a client node will let us to connection failures between the client and its router node (a server node that is a part of a topology).
> 
> In addition, bunch of other improvements and simplifications were done at the level of TcpDiscoverySpi and TcpCommunicationSpi. Changes are aggregated here:
> https://issues.apache.org/jira/browse/IGNITE-752 <https://issues.apache.org/jira/browse/IGNITE-752>
> 
> General review is passed. However if anyone wants to review as well or have any thoughts/suggestions don't hesitate to propose them.
> 
> Dmitiry S, I would like to ask you to review documentation changes in any case before I do a merge.
> 
> 
> Regards,
> Denis
>