You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Александр Меньшиков <sh...@gmail.com> on 2016/12/22 10:59:29 UTC

Sort nodes in the ring in order to minimize the number of reconnections

Hello everyone,

As far as I know nodes are connected in a ring. For example if i have 6
nodes, with names A, B, C, D, E, and F they can connect in ring any
possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node falls
out of topology neighboring nodes must reconnect. If nodes A,B and C
located in the same physical location, and D, E and F in another, and in
some time one physical location is not available in another, we can get
different number of reconnections. Best case scenario if we have ring like
A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
reconnect to A or F reconnect to D -- depending on what part of the cluster
we leave alive). But now possible that case AxFxBxExCxDxA -- then we get a
lot of reconnections (A to B, B to C, C to A -- in general n/2
reconnections, where n -- number of nodes). And i think to add something to
ensure that we always have good sorting of nodes connections
(A-B-C-...-Z-A).

Of course in real world we can have multiple levels of physical closeness.

In my opinion enough to add one parameter of 'int' to configuration (with
name like 'ExtraNodeOrder') and to change the method of comparison nodes so
that it first compared the 'ExtraNodeOrder', and then according to the old
criterion (as far as I know Ignite use topology version). So if some users
have multiple levels of physical closeness, he can use different bits. For
example use 16 high bits for DC number, and low 16 bits for racks.

Alternatively, we can add array of ‘int’ to configuration and compare nodes
in sequence from the zero element to the last.

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Denis Magda <dm...@apache.org>.

Alexander,

This is something different and looks unrelated to the discussion we have over here.

A transaction will not be rolled back the way you’re describing. It will be either committed once or rolled back once. There can be and will be inter nodes communication when something fails at the commit phase but this depends on how the affinity function distributes the keys and partitions and not how the nodes are connected at the discovery SPI layer.
 
Here you can learn more about failures handling by 2 phase commit protocol
http://gridgain.blogspot.com/2014/09/two-phase-commit-for-in-memory-caches.html <http://gridgain.blogspot.com/2014/09/two-phase-commit-for-in-memory-caches.html>

—
Denis

> On Dec 23, 2016, at 12:24 PM, Александр Меньшиков <sh...@gmail.com> wrote:
> 
> I in fact worried about the following situation:
> 
> Like i said we have ring A->F->B->E->C->D->A, and connection between A,B,C
> and D,E,F was been broken. But nodes will detect the fact of the
> unavailability of nodes not at the same time. And meanwhile the client will
> perform transactional operations. Transactions may rollback many times in
> the following sequence of events:
> 
> 0. Everything is fine: A->F->B->E->C->D->A.
> 1. Connection between A,B,C and D,E,F is broken.
> 2. "A" sees "F" falls out of topology and reconnect to "B", all
> transactions using the "F" are rolled back and begin with backup node ("B",
> for example).
> 3. After that "B" sees "E" falls out of topology and reconnect to "C", all
> transaction using "E" are rolled back and begin with backup node ("C", for
> example).
> 4. After that "C" sees "D" falls out of topology and reconnect to "A", all
> transaction using "D" are rolled back and begin with backup node ("A", for
> example).
> 
> And we get 3 different set of rollbacks, instead one set of rollbacks.
> 
> 2016-12-23 22:43 GMT+03:00 Valentin Kulichenko <
> valentin.kulichenko@gmail.com>:
> 
>> Hi Vyacheslav,
>> 
>> Discovery logic is incapsulated in TcpDiscoverySpi.
>> TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
>> The only purpose of the IP finder is to provide list of addresses where a
>> node can send initial join request, and the fact that it sends this initial
>> request to node A doesn't actually mean that it will be connected to A
>> within a ring. Having said that, I doubt that IP finder will be somehow
>> affected in case the discussed change is implemented.
>> 
>> Discovery protocol already maintains consistent information about the ring,
>> so any node in topology already knows everything about other nodes,
>> including ordering in the ring. So on discovery level it should not be very
>> difficult to customize where a joining node is placed on the ring.
>> 
>> However, here is the concern I have. Currently when a new node joins,
>> coordinator assigns order number to this node (e.g. if we already have
>> nodes 1,2 and 3, new node will have order 4). This node will then be the
>> last one on the ring, i.e. nodes are always ordered in the ring by this
>> order number (1->2->3->4->1). If we change this, we will basically allow a
>> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
>> sure if this is going to cause issues, but sounds dangerous.
>> 
>> Yakov, can you please chime in and share your thoughts on this?
>> 
>> -Val
>> 
>> On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <da...@gmail.com>
>> wrote:
>> 
>>> Thanks for reply.
>>> 
>>> I have some questions:
>>> 
>>> 1. Where the logic of Ignite cluster building is realized? DiscoverySpi
>> and
>>> TcpDiscoveryMulticastIpFinder?
>>> 
>>> 2. Which standart Ignite metrics you can recommend to use for
>>> node-ordering?
>>> 
>>> 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>>> 
>>>> I think having some user-defined ordering can be beneficial. However,
>> we
>>>> are only talking about node discovery protocol here to maintain the
>>>> cluster. All other communication between nodes happens directly (does
>> not
>>>> go through the ring).
>>>> 
>>>> D.
>>>> 
>>>> On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <
>> daradurvs@gmail.com
>>>> 
>>>> wrote:
>>>> 
>>>>> Hello, Alex!
>>>>> 
>>>>> I think it is a great idea.
>>>>> 
>>>>> I suggest to build communications between nodes on weight (or
>>> priority).
>>>>> 
>>>>> For example, ordering on latency:
>>>>> - nodes on one host = 1
>>>>> - nodes in one rack-blade = 2
>>>>> - nodes in one server-rack = 3
>>>>> - nodes in one physical cluster = 4
>>>>> - nodes in one subnet = 5
>>>>> - etc.
>>>>> 
>>>>> Maybe it'll be better to use some metrics from ClusterMetrics
>>> interface.
>>>>> 
>>>>> The algorithm of ordering can be implemented in a class such as
>>>> Comparator
>>>>> and use it when we build a cluster or we select a place for a new
>> node.
>>>>> 
>>>>> --
>>>>> With best regards,
>>>>> Vyacheslav Daradur
>>>>> 
>>>>> 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com
>>> :
>>>>> 
>>>>>> Hello everyone,
>>>>>> 
>>>>>> As far as I know nodes are connected in a ring. For example if i
>>> have 6
>>>>>> nodes, with names A, B, C, D, E, and F they can connect in ring any
>>>>>> possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some
>> node
>>>>> falls
>>>>>> out of topology neighboring nodes must reconnect. If nodes A,B and
>> C
>>>>>> located in the same physical location, and D, E and F in another,
>> and
>>>> in
>>>>>> some time one physical location is not available in another, we can
>>> get
>>>>>> different number of reconnections. Best case scenario if we have
>> ring
>>>>> like
>>>>>> A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one
>> reconnect
>>>> (C
>>>>>> reconnect to A or F reconnect to D -- depending on what part of the
>>>>> cluster
>>>>>> we leave alive). But now possible that case AxFxBxExCxDxA -- then
>> we
>>>> get
>>>>> a
>>>>>> lot of reconnections (A to B, B to C, C to A -- in general n/2
>>>>>> reconnections, where n -- number of nodes). And i think to add
>>>> something
>>>>> to
>>>>>> ensure that we always have good sorting of nodes connections
>>>>>> (A-B-C-...-Z-A).
>>>>>> 
>>>>>> Of course in real world we can have multiple levels of physical
>>>>> closeness.
>>>>>> 
>>>>>> In my opinion enough to add one parameter of 'int' to configuration
>>>> (with
>>>>>> name like 'ExtraNodeOrder') and to change the method of comparison
>>>> nodes
>>>>> so
>>>>>> that it first compared the 'ExtraNodeOrder', and then according to
>>> the
>>>>> old
>>>>>> criterion (as far as I know Ignite use topology version). So if
>> some
>>>>> users
>>>>>> have multiple levels of physical closeness, he can use different
>>> bits.
>>>>> For
>>>>>> example use 16 high bits for DC number, and low 16 bits for racks.
>>>>>> 
>>>>>> Alternatively, we can add array of ‘int’ to configuration and
>> compare
>>>>> nodes
>>>>>> in sequence from the zero element to the last.
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

I in fact worried about the following situation:

Like i said we have ring A->F->B->E->C->D->A, and connection between A,B,C
and D,E,F was been broken. But nodes will detect the fact of the
unavailability of nodes not at the same time. And meanwhile the client will
perform transactional operations. Transactions may rollback many times in
the following sequence of events:

0. Everything is fine: A->F->B->E->C->D->A.
1. Connection between A,B,C and D,E,F is broken.
2. "A" sees "F" falls out of topology and reconnect to "B", all
transactions using the "F" are rolled back and begin with backup node ("B",
for example).
3. After that "B" sees "E" falls out of topology and reconnect to "C", all
transaction using "E" are rolled back and begin with backup node ("C", for
example).
4. After that "C" sees "D" falls out of topology and reconnect to "A", all
transaction using "D" are rolled back and begin with backup node ("A", for
example).

And we get 3 different set of rollbacks, instead one set of rollbacks.

2016-12-23 22:43 GMT+03:00 Valentin Kulichenko <
valentin.kulichenko@gmail.com>:

> Hi Vyacheslav,
>
> Discovery logic is incapsulated in TcpDiscoverySpi.
> TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
> The only purpose of the IP finder is to provide list of addresses where a
> node can send initial join request, and the fact that it sends this initial
> request to node A doesn't actually mean that it will be connected to A
> within a ring. Having said that, I doubt that IP finder will be somehow
> affected in case the discussed change is implemented.
>
> Discovery protocol already maintains consistent information about the ring,
> so any node in topology already knows everything about other nodes,
> including ordering in the ring. So on discovery level it should not be very
> difficult to customize where a joining node is placed on the ring.
>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
>
> -Val
>
> On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <da...@gmail.com>
> wrote:
>
> > Thanks for reply.
> >
> > I have some questions:
> >
> > 1. Where the logic of Ignite cluster building is realized? DiscoverySpi
> and
> > TcpDiscoveryMulticastIpFinder?
> >
> > 2. Which standart Ignite metrics you can recommend to use for
> > node-ordering?
> >
> > 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
> >
> > > I think having some user-defined ordering can be beneficial. However,
> we
> > > are only talking about node discovery protocol here to maintain the
> > > cluster. All other communication between nodes happens directly (does
> not
> > > go through the ring).
> > >
> > > D.
> > >
> > > On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <
> daradurvs@gmail.com
> > >
> > > wrote:
> > >
> > > > Hello, Alex!
> > > >
> > > > I think it is a great idea.
> > > >
> > > > I suggest to build communications between nodes on weight (or
> > priority).
> > > >
> > > > For example, ordering on latency:
> > > > - nodes on one host = 1
> > > > - nodes in one rack-blade = 2
> > > > - nodes in one server-rack = 3
> > > > - nodes in one physical cluster = 4
> > > > - nodes in one subnet = 5
> > > > - etc.
> > > >
> > > > Maybe it'll be better to use some metrics from ClusterMetrics
> > interface.
> > > >
> > > > The algorithm of ordering can be implemented in a class such as
> > > Comparator
> > > > and use it when we build a cluster or we select a place for a new
> node.
> > > >
> > > > --
> > > > With best regards,
> > > > Vyacheslav Daradur
> > > >
> > > > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sharplermc@gmail.com
> >:
> > > >
> > > > > Hello everyone,
> > > > >
> > > > > As far as I know nodes are connected in a ring. For example if i
> > have 6
> > > > > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > > > > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some
> node
> > > > falls
> > > > > out of topology neighboring nodes must reconnect. If nodes A,B and
> C
> > > > > located in the same physical location, and D, E and F in another,
> and
> > > in
> > > > > some time one physical location is not available in another, we can
> > get
> > > > > different number of reconnections. Best case scenario if we have
> ring
> > > > like
> > > > > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one
> reconnect
> > > (C
> > > > > reconnect to A or F reconnect to D -- depending on what part of the
> > > > cluster
> > > > > we leave alive). But now possible that case AxFxBxExCxDxA -- then
> we
> > > get
> > > > a
> > > > > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > > > > reconnections, where n -- number of nodes). And i think to add
> > > something
> > > > to
> > > > > ensure that we always have good sorting of nodes connections
> > > > > (A-B-C-...-Z-A).
> > > > >
> > > > > Of course in real world we can have multiple levels of physical
> > > > closeness.
> > > > >
> > > > > In my opinion enough to add one parameter of 'int' to configuration
> > > (with
> > > > > name like 'ExtraNodeOrder') and to change the method of comparison
> > > nodes
> > > > so
> > > > > that it first compared the 'ExtraNodeOrder', and then according to
> > the
> > > > old
> > > > > criterion (as far as I know Ignite use topology version). So if
> some
> > > > users
> > > > > have multiple levels of physical closeness, he can use different
> > bits.
> > > > For
> > > > > example use 16 high bits for DC number, and low 16 bits for racks.
> > > > >
> > > > > Alternatively, we can add array of ‘int’ to configuration and
> compare
> > > > nodes
> > > > > in sequence from the zero element to the last.
> > > > >
> > > >
> > >
> >
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

>>
I thought of latency values.

Latency between host nodes < Latency between same rack nodes < Latency
between subnet nodes < etc.
>>

Vyacheslav, I agree that latency increase in the way you describe, but I
still don't understand how we use this information in discovery. Latency
may differ from time to time depending on many factors. I still think that
arc approach is more intuitive for user and easier to implement.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Vyacheslav Daradur <da...@gmail.com>.

>>
Vyacheslav, please elaborate on how we can determine whether we are on the
same rack. I am not sure this is possible in general case. Please see my
suggestions below.
>>

I thought of latency values.

Latency between host nodes < Latency between same rack nodes < Latency
between subnet nodes < etc.


2016-12-26 12:20 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> >>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
> >>
>
> Vyacheslav, please elaborate on how we can determine whether we are on the
> same rack. I am not sure this is possible in general case. Please see my
> suggestions below.
>
> >>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
> >>
>
> I don't think this may cause issues. Nodes ordering and placement is
> implemented in TcpDiscoveryNodesRing and I think that we will just need to
> alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#
> nextNode(java.util.Collection<org.apache.ignite.spi.
> discovery.tcp.internal.TcpDiscoveryNode>)
> logic.
>
> As far as design of this, I would suggest the following.
>
> 1.  User should have an ability to define ARC_ID for the node. I suggest
> "arc" for this since we are using "ring" concept. This will be the most
> honored characteristic for nodes placement. By default arc_id is 0 and
> possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
> via TcpDiscoverySpi.setArcId() - new method.
> So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
> to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
> represent different racks or data centers.
>
> I am strongly against giving user an opportunity to point exact place in
> the ring with somewhat like this interface [int getIdex(Node newNode,
> List<Node> currentRing)]. This is very error prone and may require tricky
> consistency checks just to make sure that implementation of this interface
> is consistent along the topology.
> With "arcs" approach user can automatically assign proper ids basing on
> physical network topology and network routes.
>
> 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
> placed side by side in the same arc.
>
> 3. Physical host - 3rd honored parameter. Nodes on the same physical host
> should be placed together automatically in the same arc.
>
> 4. New mode involving points 1-3 should become default and we should also
> provide ability to switch to current mode which should become legacy.
>
> --Yakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Dmitriy Setrakyan <ds...@apache.org>.

I agree with Yakov. Having an integer as region ID should be sufficient to
support all the use cases.

On Thu, Jan 19, 2017 at 4:37 AM, Yakov Zhdanov <yz...@apache.org> wrote:

> Alexander, as far as I remember we talked about having cluster id set via
> TcpDiscoverySpi configuration and also via system property.
>
> What do you mean by "existence of sufficient comparator"? If we require
> that attribute is integer then we don't have any problems. If it is not
> integer then you should throw exception on start.
>
> I repeat this once again - we need to (1)prevent our users from falling
> into terrible discovery issues (which are hard to debug) if users provide
> inconsistent comparators on different nodes, but we also need to have
> (2)full flexibility and control over nodes ordering. I think if we don't
> have comparator on public API then we are still OK with both points above.
> Therefore, I would like you to implement this feature using the approach we
> agreed on before. Let me know if you want me to take a look at the code
> before you proceed.
>
> --Yakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

Guys, I have just commented in the ticket.

I suggest to unschedule IGNITE-4501 from 2.1. Let's return to it at some
point.

Thanks!

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Need to do code review until February 17, if we want to get this feature in
version 1.9.

2017-02-08 22:14 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:

> Done. Please look.
>
> JIRA: https://issues.apache.org/jira/browse/IGNITE-4501
> PR: https://github.com/apache/ignite/pull/1436/files
> Tests: http://ci.ignite.apache.org/project.html?projectId=IgniteTes
> ts&tab=projectOverview&branch_IgniteTests=pull/1436/head
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Done. Please look.

JIRA: https://issues.apache.org/jira/browse/IGNITE-4501
PR: https://github.com/apache/ignite/pull/1436/files
Tests: http://ci.ignite.apache.org/project.html?projectId=IgniteTes
ts&tab=projectOverview&branch_IgniteTests=pull/1436/head

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Igor, I have thought about approach what you are talking about. It need add
new field named like "sortedNodes" with custom ordering, which will have
the same items as "nodes" field, because "nodes" has being used with
default ordering in other methods. It have this advantages:

1. Method "nextNode" will look simpler.
2. Method "nextNode" will work faster, because using of method
TreeSet#higher() will be available. But that possibility had not been used
in original code. And I don't why.


But also have some disadvantages because new field "sortedNodes" will be
strongly connected with "nodes":
1. It need copy-paste all code, which modifies "nodes" in 4 other methods.
It will decrease maintainability.
2. Field "nodes" is being used with "copy-on-write" algorithm. So state of
"nodes" and "sortedNodes" can be inconsistent. Maybe it's okay, in fact I
just don't know. But any way in future it may become a problem.

So my opinion is that "presorted" approach can work a little bit faster
(number of nodes never can't be so big that O(log n) became more faster
than O(n)), but code complexity will been increased, because it will add
one logic connection inside the whole class "TcpDiscoveryNodesRing".

Yakov, can you settle our argument?

2017-01-20 16:30 GMT+03:00 Игорь Г <fr...@gmail.com>:

> Alexander, maybe you should use presorted collection in
> TcpDiscoveryNodesRing.nextNode instead of iterating through unsorted one
> every time?
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Игорь Г <fr...@gmail.com>.

Alexander, maybe you should use presorted collection in
TcpDiscoveryNodesRing.nextNode instead of iterating through unsorted one
every time?

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Yakov, I changed the implementation. Please, look at my code again.

https://github.com/apache/ignite/pull/1436

2017-01-19 15:37 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> Alexander, as far as I remember we talked about having cluster id set via
> TcpDiscoverySpi configuration and also via system property.
>
> What do you mean by "existence of sufficient comparator"? If we require
> that attribute is integer then we don't have any problems. If it is not
> integer then you should throw exception on start.
>
> I repeat this once again - we need to (1)prevent our users from falling
> into terrible discovery issues (which are hard to debug) if users provide
> inconsistent comparators on different nodes, but we also need to have
> (2)full flexibility and control over nodes ordering. I think if we don't
> have comparator on public API then we are still OK with both points above.
> Therefore, I would like you to implement this feature using the approach we
> agreed on before. Let me know if you want me to take a look at the code
> before you proceed.
>
> --Yakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

Alexander, as far as I remember we talked about having cluster id set via
TcpDiscoverySpi configuration and also via system property.

What do you mean by "existence of sufficient comparator"? If we require
that attribute is integer then we don't have any problems. If it is not
integer then you should throw exception on start.

I repeat this once again - we need to (1)prevent our users from falling
into terrible discovery issues (which are hard to debug) if users provide
inconsistent comparators on different nodes, but we also need to have
(2)full flexibility and control over nodes ordering. I think if we don't
have comparator on public API then we are still OK with both points above.
Therefore, I would like you to implement this feature using the approach we
agreed on before. Let me know if you want me to take a look at the code
before you proceed.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Yakov, as I understand it we need add CLUSTER_REGION_ID for each nodes in
config file. And in fact using some kind of sort in nextNode method (the
search for extreme values to be exact). And the existence of valid
comparator is a sufficient condition to sort nodes to build new correct
ring. So I has thought we will not get any extra benefits (performance or
maintainability) if we close the ability for users to set their sort logic.
Code will a similar in two variants. I has thought if I show this variant
will be easier to see that variant is okay.
But if not, then I can fast change code.

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

Alexander, I was against any comparator and user defined logic exactly for
reason that comparison may be inconsistent. After long discussion and
consensus you implement approach with comparator. Can you please explain
why you did not just add logic to compare the value of CLUSTER_REGION_ID
node attribute?

--Yakov

2017-01-18 16:48 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:

> I done that things:
>
> -- Add to TcpDiscoverySpi field Comparator<TcpDiscoveryNode>
> nodeComparator for load custom comparators from config file like bean.
> -- Add implementation with old behavior: BaseNodeComparator
> -- Add region id implementation: RegionNodeComparator which get map from
> IP address to region ID in constructor.
> -- Modified TcpDiscoveryNodesRing#nextNode for using nodeComparator for
> find next node.
>
> You can see that in PR: https://github.com/apache/ignite/pull/1436
>
> Main question is: how to test it?
>
> For my local test i just changed BaseNodeComparator with this odd
> comparator:
>
> new Comparator<TcpDiscoveryNode>() {
>                 @Override
>                 public int compare(TcpDiscoveryNode t1, TcpDiscoveryNode
> t2) {
>                     //shuffle nodes
>                     final int ans = Long.compare((t1.internalOrder()*3L+13L)%4L,
> (t2.internalOrder()*3L+13L)%4L);
>                     return (ans==0)?t1.compareTo(t2):ans;
>                 }
>             };
>
> It's looking scary, but in fact it just consistently shuffle nodes. If you
> have 4 nodes with topology versions 1, 2, 3 and 4, it will be ring: 1-4-3-2.
>
> So I think if we just using in old test this shuffle comparator and
> nothing gone wrong it's good enough.
>
> But any way I don't know how to add that to tests.
>
> And may be we need some test for custom comparators. But in fact comparators
> just must be valid Java comparator and work the same on all nodes.
>
> Any comments are welcome.
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

I done that things:

-- Add to TcpDiscoverySpi field Comparator<TcpDiscoveryNode> nodeComparator
for load custom comparators from config file like bean.
-- Add implementation with old behavior: BaseNodeComparator
-- Add region id implementation: RegionNodeComparator which get map from IP
address to region ID in constructor.
-- Modified TcpDiscoveryNodesRing#nextNode for using nodeComparator for
find next node.

You can see that in PR: https://github.com/apache/ignite/pull/1436

Main question is: how to test it?

For my local test i just changed BaseNodeComparator with this odd
comparator:

new Comparator<TcpDiscoveryNode>() {
                @Override
                public int compare(TcpDiscoveryNode t1, TcpDiscoveryNode
t2) {
                    //shuffle nodes
                    final int ans =
Long.compare((t1.internalOrder()*3L+13L)%4L,
(t2.internalOrder()*3L+13L)%4L);
                    return (ans==0)?t1.compareTo(t2):ans;
                }
            };

It's looking scary, but in fact it just consistently shuffle nodes. If you
have 4 nodes with topology versions 1, 2, 3 and 4, it will be ring: 1-4-3-2.

So I think if we just using in old test this shuffle comparator and nothing
gone wrong it's good enough.

But any way I don't know how to add that to tests.

And may be we need some test for custom comparators. But in fact comparators
just must be valid Java comparator and work the same on all nodes.

Any comments are welcome.

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

Alexander, sounds good! Please post updates to ticket and this thread (if
necessary) while working.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

I think that in the weeks after the 'new year' holidays or sooner.

2016-12-29 13:28 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> Guys, I have updated the ticket.
>
> Alexander Menshikov, when do you expect the implementation to be finished?
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

Guys, I have updated the ticket.

Alexander Menshikov, when do you expect the implementation to be finished?

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

I am OK with CLUSTER_REGION_ID. However I would like this name -
DISCOVERY_REGION_ID - more.

>>
Do you think we will ever care about the order of nodes within the same
region, e.g. does the order of nodes within the same rack matter?
>>

I think this is too much, but if you care you still can. Imagine you have
1000..1010 values for machines in rack 1 and 2000..2010 values for machines
in rack 2. So you can set exact position for each machine. This approach
provides full flexibility.

I will update https://issues.apache.org/jira/browse/IGNITE-4501 shortly.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Dmitriy Setrakyan <ds...@apache.org>.

Actually, after giving it some thought, I now think that the same kind of
flexibility can be achieved by giving multiple nodes the same
CLUSTER_REGION_ID (don't like the arc id). For example, nodes in 2 racks
could be given CLUSTER_REGION_ID of 1 and 2. This way all nodes in rack 1
or rack 2 would be next to each other in the cluster ring.

Do you think we will ever care about the order of nodes within the same
region, e.g. does the order of nodes within the same rack matter?

D.

On Tue, Dec 27, 2016 at 7:30 AM, Dmitriy Setrakyan <ds...@apache.org>
wrote:

>
>
> On Tue, Dec 27, 2016 at 2:32 AM, Alexei Scherbakov <
> alexey.scherbakoff@gmail.com> wrote:
>
>> 2016-12-27 10:42 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
>> > I think the NodeComparator approach will work. User can chose how to
>> sort
>> > nodes from one rack before nodes from another rack. Same goes for
>> subnets,
>> > or data centers.
>> > >>
>> >
>> > Dmitry, can you please explain why you enforce user to write code? This
>> > does not seem convenient to me at all. If user wants to write code then
>> he
>> > can do it for calculating proper arc_id.
>> >
>>
>> Yakov, where is no need to for user to write code. We can provide two
>> default Comparator implementations:
>> first based on IP address(default), and second based on node attribute.
>> User just plugs one of the implementations and adds node attribute to node
>> config in second case - let it be ARC_ID by default.
>
>
> Completely agree with Alexey here. NodeComparator sounds like a generic
> approach. We can provide various implementations of comparator with
> different sorting strategies out of the box.
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Dmitriy Setrakyan <ds...@apache.org>.

On Tue, Dec 27, 2016 at 2:32 AM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> 2016-12-27 10:42 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
> > I think the NodeComparator approach will work. User can chose how to sort
> > nodes from one rack before nodes from another rack. Same goes for
> subnets,
> > or data centers.
> > >>
> >
> > Dmitry, can you please explain why you enforce user to write code? This
> > does not seem convenient to me at all. If user wants to write code then
> he
> > can do it for calculating proper arc_id.
> >
>
> Yakov, where is no need to for user to write code. We can provide two
> default Comparator implementations:
> first based on IP address(default), and second based on node attribute.
> User just plugs one of the implementations and adds node attribute to node
> config in second case - let it be ARC_ID by default.


Completely agree with Alexey here. NodeComparator sounds like a generic
approach. We can provide various implementations of comparator with
different sorting strategies out of the box.

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

My JIRA account is:
Username:sharplerFull Name:Alexander Menshikov

2016-12-27 17:22 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:

> Yes, i can. But someone needs to give me the rights of contributor in Jira.
>
> 2016-12-27 17:07 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
>
>> I have described a task: https://issues.apache.org/jira
>> /browse/IGNITE-4501
>>
>> and linked a bug https://issues.apache.org/jira/browse/IGNITE-4499
>>
>> Alex Menshikov, maybe you will take her?
>>
>>
>> 2016-12-27 13:32 GMT+03:00 Alexei Scherbakov <
>> alexey.scherbakoff@gmail.com>:
>>
>> > 2016-12-27 10:42 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
>> >
>> > > >>
>> > > My main concern here is code complexity. Yakov, how difficult it is to
>> > > stick a new node in an arbitrary spot of a discovery ring?
>> > > >>
>> > >
>> > > Dmitry, I think this is not hard. At least I don't see any issue now.
>> > >
>> > > >>
>> > > I think the NodeComparator approach will work. User can chose how to
>> sort
>> > > nodes from one rack before nodes from another rack. Same goes for
>> > subnets,
>> > > or data centers.
>> > > >>
>> > >
>> > > Dmitry, can you please explain why you enforce user to write code?
>> This
>> > > does not seem convenient to me at all. If user wants to write code
>> then
>> > he
>> > > can do it for calculating proper arc_id.
>> > >
>> >
>> > Yakov, where is no need to for user to write code. We can provide two
>> > default Comparator implementations:
>> > first based on IP address(default), and second based on node attribute.
>> > User just plugs one of the implementations and adds node attribute to
>> node
>> > config in second case - let it be ARC_ID by default.
>> >
>> >
>> > >
>> > > Another point I already posted to this thread - this is very error
>> prone.
>> > >
>> > > >>
>> > > I am strongly against giving user an opportunity to point exact place
>> in
>> > > the ring with somewhat like this interface [int getIdex(Node newNode,
>> > > List<Node> currentRing)]. This is very error prone and may require
>> tricky
>> > > consistency checks just to make sure that implementation of this
>> > interface
>> > > is consistent along the topology.
>> > > With "arcs" approach user can automatically assign proper ids basing
>> on
>> > > physical network topology and network routes.
>> > > >>
>> > >
>> > > I still think arc_id is better:
>> > > 1. No code from user side. Only env variable or system property on a
>> > > machine.
>> > > 2. All code inside Ignite - easy to fix and change if required.
>> > > 3. All benefits of comparator are still available.
>> > >
>> >
>> > I suppose my approach is more generic and also matches listed
>> requirements.
>> >
>> >
>> > >
>> > > Alex, I still don't get how you (and other guys as well) want to deal
>> > with
>> > > latencies here. I would like you explain how you solve this - you have
>> > 1000
>> > > IP addresses, and you need to sort them in your beloved latency order,
>> > but
>> > > please note that you need to get exactly the same ring on all of these
>> > 1000
>> > > machines.
>> > >
>> >
>> > Calculating latencies are beyond scope of generic approach of nodes
>> > ordering.
>> > It's just of one of possible NodeComparator implementations.
>> > Let's not bother this it right now.
>> >
>> >
>> > >
>> > > --Yakov
>> > >
>> >
>> >
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>> >
>>
>
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Yes, i can. But someone needs to give me the rights of contributor in Jira.

2016-12-27 17:07 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:

> I have described a task: https://issues.apache.org/jira/browse/IGNITE-4501
>
> and linked a bug https://issues.apache.org/jira/browse/IGNITE-4499
>
> Alex Menshikov, maybe you will take her?
>
>
> 2016-12-27 13:32 GMT+03:00 Alexei Scherbakov <alexey.scherbakoff@gmail.com
> >:
>
> > 2016-12-27 10:42 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
> >
> > > >>
> > > My main concern here is code complexity. Yakov, how difficult it is to
> > > stick a new node in an arbitrary spot of a discovery ring?
> > > >>
> > >
> > > Dmitry, I think this is not hard. At least I don't see any issue now.
> > >
> > > >>
> > > I think the NodeComparator approach will work. User can chose how to
> sort
> > > nodes from one rack before nodes from another rack. Same goes for
> > subnets,
> > > or data centers.
> > > >>
> > >
> > > Dmitry, can you please explain why you enforce user to write code? This
> > > does not seem convenient to me at all. If user wants to write code then
> > he
> > > can do it for calculating proper arc_id.
> > >
> >
> > Yakov, where is no need to for user to write code. We can provide two
> > default Comparator implementations:
> > first based on IP address(default), and second based on node attribute.
> > User just plugs one of the implementations and adds node attribute to
> node
> > config in second case - let it be ARC_ID by default.
> >
> >
> > >
> > > Another point I already posted to this thread - this is very error
> prone.
> > >
> > > >>
> > > I am strongly against giving user an opportunity to point exact place
> in
> > > the ring with somewhat like this interface [int getIdex(Node newNode,
> > > List<Node> currentRing)]. This is very error prone and may require
> tricky
> > > consistency checks just to make sure that implementation of this
> > interface
> > > is consistent along the topology.
> > > With "arcs" approach user can automatically assign proper ids basing on
> > > physical network topology and network routes.
> > > >>
> > >
> > > I still think arc_id is better:
> > > 1. No code from user side. Only env variable or system property on a
> > > machine.
> > > 2. All code inside Ignite - easy to fix and change if required.
> > > 3. All benefits of comparator are still available.
> > >
> >
> > I suppose my approach is more generic and also matches listed
> requirements.
> >
> >
> > >
> > > Alex, I still don't get how you (and other guys as well) want to deal
> > with
> > > latencies here. I would like you explain how you solve this - you have
> > 1000
> > > IP addresses, and you need to sort them in your beloved latency order,
> > but
> > > please note that you need to get exactly the same ring on all of these
> > 1000
> > > machines.
> > >
> >
> > Calculating latencies are beyond scope of generic approach of nodes
> > ordering.
> > It's just of one of possible NodeComparator implementations.
> > Let's not bother this it right now.
> >
> >
> > >
> > > --Yakov
> > >
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Vyacheslav Daradur <da...@gmail.com>.

I have described a task: https://issues.apache.org/jira/browse/IGNITE-4501

and linked a bug https://issues.apache.org/jira/browse/IGNITE-4499

Alex Menshikov, maybe you will take her?


2016-12-27 13:32 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> 2016-12-27 10:42 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
>
> > >>
> > My main concern here is code complexity. Yakov, how difficult it is to
> > stick a new node in an arbitrary spot of a discovery ring?
> > >>
> >
> > Dmitry, I think this is not hard. At least I don't see any issue now.
> >
> > >>
> > I think the NodeComparator approach will work. User can chose how to sort
> > nodes from one rack before nodes from another rack. Same goes for
> subnets,
> > or data centers.
> > >>
> >
> > Dmitry, can you please explain why you enforce user to write code? This
> > does not seem convenient to me at all. If user wants to write code then
> he
> > can do it for calculating proper arc_id.
> >
>
> Yakov, where is no need to for user to write code. We can provide two
> default Comparator implementations:
> first based on IP address(default), and second based on node attribute.
> User just plugs one of the implementations and adds node attribute to node
> config in second case - let it be ARC_ID by default.
>
>
> >
> > Another point I already posted to this thread - this is very error prone.
> >
> > >>
> > I am strongly against giving user an opportunity to point exact place in
> > the ring with somewhat like this interface [int getIdex(Node newNode,
> > List<Node> currentRing)]. This is very error prone and may require tricky
> > consistency checks just to make sure that implementation of this
> interface
> > is consistent along the topology.
> > With "arcs" approach user can automatically assign proper ids basing on
> > physical network topology and network routes.
> > >>
> >
> > I still think arc_id is better:
> > 1. No code from user side. Only env variable or system property on a
> > machine.
> > 2. All code inside Ignite - easy to fix and change if required.
> > 3. All benefits of comparator are still available.
> >
>
> I suppose my approach is more generic and also matches listed requirements.
>
>
> >
> > Alex, I still don't get how you (and other guys as well) want to deal
> with
> > latencies here. I would like you explain how you solve this - you have
> 1000
> > IP addresses, and you need to sort them in your beloved latency order,
> but
> > please note that you need to get exactly the same ring on all of these
> 1000
> > machines.
> >
>
> Calculating latencies are beyond scope of generic approach of nodes
> ordering.
> It's just of one of possible NodeComparator implementations.
> Let's not bother this it right now.
>
>
> >
> > --Yakov
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Alexei Scherbakov <al...@gmail.com>.

2016-12-27 10:42 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> >>
> My main concern here is code complexity. Yakov, how difficult it is to
> stick a new node in an arbitrary spot of a discovery ring?
> >>
>
> Dmitry, I think this is not hard. At least I don't see any issue now.
>
> >>
> I think the NodeComparator approach will work. User can chose how to sort
> nodes from one rack before nodes from another rack. Same goes for subnets,
> or data centers.
> >>
>
> Dmitry, can you please explain why you enforce user to write code? This
> does not seem convenient to me at all. If user wants to write code then he
> can do it for calculating proper arc_id.
>

Yakov, where is no need to for user to write code. We can provide two
default Comparator implementations:
first based on IP address(default), and second based on node attribute.
User just plugs one of the implementations and adds node attribute to node
config in second case - let it be ARC_ID by default.


>
> Another point I already posted to this thread - this is very error prone.
>
> >>
> I am strongly against giving user an opportunity to point exact place in
> the ring with somewhat like this interface [int getIdex(Node newNode,
> List<Node> currentRing)]. This is very error prone and may require tricky
> consistency checks just to make sure that implementation of this interface
> is consistent along the topology.
> With "arcs" approach user can automatically assign proper ids basing on
> physical network topology and network routes.
> >>
>
> I still think arc_id is better:
> 1. No code from user side. Only env variable or system property on a
> machine.
> 2. All code inside Ignite - easy to fix and change if required.
> 3. All benefits of comparator are still available.
>

I suppose my approach is more generic and also matches listed requirements.


>
> Alex, I still don't get how you (and other guys as well) want to deal with
> latencies here. I would like you explain how you solve this - you have 1000
> IP addresses, and you need to sort them in your beloved latency order, but
> please note that you need to get exactly the same ring on all of these 1000
> machines.
>

Calculating latencies are beyond scope of generic approach of nodes
ordering.
It's just of one of possible NodeComparator implementations.
Let's not bother this it right now.


>
> --Yakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

>>
My main concern here is code complexity. Yakov, how difficult it is to
stick a new node in an arbitrary spot of a discovery ring?
>>

Dmitry, I think this is not hard. At least I don't see any issue now.

>>
I think the NodeComparator approach will work. User can chose how to sort
nodes from one rack before nodes from another rack. Same goes for subnets,
or data centers.
>>

Dmitry, can you please explain why you enforce user to write code? This
does not seem convenient to me at all. If user wants to write code then he
can do it for calculating proper arc_id.

Another point I already posted to this thread - this is very error prone.

>>
I am strongly against giving user an opportunity to point exact place in
the ring with somewhat like this interface [int getIdex(Node newNode,
List<Node> currentRing)]. This is very error prone and may require tricky
consistency checks just to make sure that implementation of this interface
is consistent along the topology.
With "arcs" approach user can automatically assign proper ids basing on
physical network topology and network routes.
>>

I still think arc_id is better:
1. No code from user side. Only env variable or system property on a
machine.
2. All code inside Ignite - easy to fix and change if required.
3. All benefits of comparator are still available.

Alex, I still don't get how you (and other guys as well) want to deal with
latencies here. I would like you explain how you solve this - you have 1000
IP addresses, and you need to sort them in your beloved latency order, but
please note that you need to get exactly the same ring on all of these 1000
machines.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Dmitriy Setrakyan <ds...@apache.org>.

I think the NodeComparator approach will work. User can chose how to sort
nodes from one rack before nodes from another rack. Same goes for subnets,
or data centers.

My main concern here is code complexity. Yakov, how difficult it is to
stick a new node in an arbitrary spot of a discovery ring?

D.

On Mon, Dec 26, 2016 at 12:42 PM, Alexei Scherbakov <
alexey.scherbakoff@gmail.com> wrote:

> Of course where is no need to sort all nodes.
>
> It's enough just to select smallest node.
>
> 2016-12-26 22:29 GMT+03:00 Alexei Scherbakov <alexey.scherbakoff@gmail.com
> >:
>
> > Yakov,
> >
> > ARC_ID approach seems just a variation of node attribute based ordering
> > for me.
> >
> > I suggest more generic approach.
> >
> > What if we define node ordering using something like NodeComparator?
> >
> > Then a new node joins topology, it calculates node for joining using
> > sorting on current topology + new node.
> >
> > nextNode just takes first element in sorted list. It's guaranteed what
> all
> > nodes will return the same sorted list for the topology version.
> >
> > We can provide default implementation based on IP address:
> >
> > nodes on the same host : nodes on the same subnet : other nodes
> >
> > I think this will work for most cases.
> >
> > If needed user can provide it's own comparison strategy based on
> > latencies, data centers, whatever.
> >
> >
> >
> >
> >
> >
> >
> > 2016-12-26 17:17 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
> >
> >> > Can you please explain why this is better than arc approach?
> >>
> >> We had a misunderstanding. Everything okay with arc approach. But we
> must
> >> choose how nodes will determine "ARC_ID", and i think it can be
> calculated
> >> from latency values. If users will be able to set "ARC_ID" in config
> file
> >> then they can set different 'ARC_ID' on all nodes, and, in fact, point
> >> exact place in the ring, what we would like to avoid.
> >>
> >> 2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
> >>
> >> > >>
> >> > Vyacheslav, I agree that latency increase in the way you describe,
> but I
> >> > still don't understand how we use this information in discovery.
> Latency
> >> > may differ from time to time depending on many factors. I still think
> >> that
> >> > arc approach is more intuitive for user and easier to implement.
> >> > >>
> >> >
> >> > Way of latency increase is just a main idea.
> >> >
> >> > I suggest to connect new node on some priority.
> >> > General approach:
> >> > --
> >> > if [ there are same host node ] then [ connect with it ]
> >> > else if [ there are same subnet nodes] then [ connect with one of
> them ]
> >> >  // how to choose node from a set of subnet? - choose with min latency
> >> each
> >> > other
> >> > else [ connect to remote nodes ] // how to choose node from a set of
> >> > remotes? - choose with min latency each other
> >> > --
> >> > Maybe we can describe another intermediate steps.
> >> >
> >> >
> >> > 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
> >> >
> >> > > >>
> >> > > I just want to understand which benefits we get when implement what
> >> we're
> >> > > talking about. If major benefit is reduced latency of ring messages,
> >> then
> >> > > the assignment 'ARC ID' in accordance with latency value is quite
> >> > > enough. But if there are any hidden problems because of the large
> >> number
> >> > of
> >> > > reconnection (like I described in first message in this discussion),
> >> then
> >> > > better to find a way to determine real physical location.
> >> > > >>
> >> > >
> >> > > I suggest to solve ring building up and reducing number of
> reconnects
> >> > > separately. If we have AxB-C-D-A then A will try to reconnect to B,
> >> then
> >> > to
> >> > > C, then to D. This is how discovery works now. I agree this should
> be
> >> > fixed
> >> > > and I have couple ideas on how we can do it but let's separate these
> >> > ones.
> >> > >
> >> > > >>
> >> > > Okey, then i think Vyacheslav's idea (using latency values) is quite
> >> > enough
> >> > > when we can't determine real physical location.
> >> > > >>
> >> > >
> >> > > Can you please explain why this is better than arc approach?
> >> > >
> >> > > --Yakov
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
> >
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Alexei Scherbakov <al...@gmail.com>.

Of course where is no need to sort all nodes.

It's enough just to select smallest node.

2016-12-26 22:29 GMT+03:00 Alexei Scherbakov <al...@gmail.com>:

> Yakov,
>
> ARC_ID approach seems just a variation of node attribute based ordering
> for me.
>
> I suggest more generic approach.
>
> What if we define node ordering using something like NodeComparator?
>
> Then a new node joins topology, it calculates node for joining using
> sorting on current topology + new node.
>
> nextNode just takes first element in sorted list. It's guaranteed what all
> nodes will return the same sorted list for the topology version.
>
> We can provide default implementation based on IP address:
>
> nodes on the same host : nodes on the same subnet : other nodes
>
> I think this will work for most cases.
>
> If needed user can provide it's own comparison strategy based on
> latencies, data centers, whatever.
>
>
>
>
>
>
>
> 2016-12-26 17:17 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
>
>> > Can you please explain why this is better than arc approach?
>>
>> We had a misunderstanding. Everything okay with arc approach. But we must
>> choose how nodes will determine "ARC_ID", and i think it can be calculated
>> from latency values. If users will be able to set "ARC_ID" in config file
>> then they can set different 'ARC_ID' on all nodes, and, in fact, point
>> exact place in the ring, what we would like to avoid.
>>
>> 2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
>>
>> > >>
>> > Vyacheslav, I agree that latency increase in the way you describe, but I
>> > still don't understand how we use this information in discovery. Latency
>> > may differ from time to time depending on many factors. I still think
>> that
>> > arc approach is more intuitive for user and easier to implement.
>> > >>
>> >
>> > Way of latency increase is just a main idea.
>> >
>> > I suggest to connect new node on some priority.
>> > General approach:
>> > --
>> > if [ there are same host node ] then [ connect with it ]
>> > else if [ there are same subnet nodes] then [ connect with one of them ]
>> >  // how to choose node from a set of subnet? - choose with min latency
>> each
>> > other
>> > else [ connect to remote nodes ] // how to choose node from a set of
>> > remotes? - choose with min latency each other
>> > --
>> > Maybe we can describe another intermediate steps.
>> >
>> >
>> > 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
>> >
>> > > >>
>> > > I just want to understand which benefits we get when implement what
>> we're
>> > > talking about. If major benefit is reduced latency of ring messages,
>> then
>> > > the assignment 'ARC ID' in accordance with latency value is quite
>> > > enough. But if there are any hidden problems because of the large
>> number
>> > of
>> > > reconnection (like I described in first message in this discussion),
>> then
>> > > better to find a way to determine real physical location.
>> > > >>
>> > >
>> > > I suggest to solve ring building up and reducing number of reconnects
>> > > separately. If we have AxB-C-D-A then A will try to reconnect to B,
>> then
>> > to
>> > > C, then to D. This is how discovery works now. I agree this should be
>> > fixed
>> > > and I have couple ideas on how we can do it but let's separate these
>> > ones.
>> > >
>> > > >>
>> > > Okey, then i think Vyacheslav's idea (using latency values) is quite
>> > enough
>> > > when we can't determine real physical location.
>> > > >>
>> > >
>> > > Can you please explain why this is better than arc approach?
>> > >
>> > > --Yakov
>> > >
>> >
>>
>
>
>
> --
>
> Best regards,
> Alexei Scherbakov
>



-- 

Best regards,
Alexei Scherbakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Alexei Scherbakov <al...@gmail.com>.

Yakov,

ARC_ID approach seems just a variation of node attribute based ordering for
me.

I suggest more generic approach.

What if we define node ordering using something like NodeComparator?

Then a new node joins topology, it calculates node for joining using
sorting on current topology + new node.

nextNode just takes first element in sorted list. It's guaranteed what all
nodes will return the same sorted list for the topology version.

We can provide default implementation based on IP address:

nodes on the same host : nodes on the same subnet : other nodes

I think this will work for most cases.

If needed user can provide it's own comparison strategy based on latencies,
data centers, whatever.







2016-12-26 17:17 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:

> > Can you please explain why this is better than arc approach?
>
> We had a misunderstanding. Everything okay with arc approach. But we must
> choose how nodes will determine "ARC_ID", and i think it can be calculated
> from latency values. If users will be able to set "ARC_ID" in config file
> then they can set different 'ARC_ID' on all nodes, and, in fact, point
> exact place in the ring, what we would like to avoid.
>
> 2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:
>
> > >>
> > Vyacheslav, I agree that latency increase in the way you describe, but I
> > still don't understand how we use this information in discovery. Latency
> > may differ from time to time depending on many factors. I still think
> that
> > arc approach is more intuitive for user and easier to implement.
> > >>
> >
> > Way of latency increase is just a main idea.
> >
> > I suggest to connect new node on some priority.
> > General approach:
> > --
> > if [ there are same host node ] then [ connect with it ]
> > else if [ there are same subnet nodes] then [ connect with one of them ]
> >  // how to choose node from a set of subnet? - choose with min latency
> each
> > other
> > else [ connect to remote nodes ] // how to choose node from a set of
> > remotes? - choose with min latency each other
> > --
> > Maybe we can describe another intermediate steps.
> >
> >
> > 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
> >
> > > >>
> > > I just want to understand which benefits we get when implement what
> we're
> > > talking about. If major benefit is reduced latency of ring messages,
> then
> > > the assignment 'ARC ID' in accordance with latency value is quite
> > > enough. But if there are any hidden problems because of the large
> number
> > of
> > > reconnection (like I described in first message in this discussion),
> then
> > > better to find a way to determine real physical location.
> > > >>
> > >
> > > I suggest to solve ring building up and reducing number of reconnects
> > > separately. If we have AxB-C-D-A then A will try to reconnect to B,
> then
> > to
> > > C, then to D. This is how discovery works now. I agree this should be
> > fixed
> > > and I have couple ideas on how we can do it but let's separate these
> > ones.
> > >
> > > >>
> > > Okey, then i think Vyacheslav's idea (using latency values) is quite
> > enough
> > > when we can't determine real physical location.
> > > >>
> > >
> > > Can you please explain why this is better than arc approach?
> > >
> > > --Yakov
> > >
> >
>



-- 

Best regards,
Alexei Scherbakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

> Can you please explain why this is better than arc approach?

We had a misunderstanding. Everything okay with arc approach. But we must
choose how nodes will determine "ARC_ID", and i think it can be calculated
from latency values. If users will be able to set "ARC_ID" in config file
then they can set different 'ARC_ID' on all nodes, and, in fact, point
exact place in the ring, what we would like to avoid.

2016-12-26 15:36 GMT+03:00 Vyacheslav Daradur <da...@gmail.com>:

> >>
> Vyacheslav, I agree that latency increase in the way you describe, but I
> still don't understand how we use this information in discovery. Latency
> may differ from time to time depending on many factors. I still think that
> arc approach is more intuitive for user and easier to implement.
> >>
>
> Way of latency increase is just a main idea.
>
> I suggest to connect new node on some priority.
> General approach:
> --
> if [ there are same host node ] then [ connect with it ]
> else if [ there are same subnet nodes] then [ connect with one of them ]
>  // how to choose node from a set of subnet? - choose with min latency each
> other
> else [ connect to remote nodes ] // how to choose node from a set of
> remotes? - choose with min latency each other
> --
> Maybe we can describe another intermediate steps.
>
>
> 2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:
>
> > >>
> > I just want to understand which benefits we get when implement what we're
> > talking about. If major benefit is reduced latency of ring messages, then
> > the assignment 'ARC ID' in accordance with latency value is quite
> > enough. But if there are any hidden problems because of the large number
> of
> > reconnection (like I described in first message in this discussion), then
> > better to find a way to determine real physical location.
> > >>
> >
> > I suggest to solve ring building up and reducing number of reconnects
> > separately. If we have AxB-C-D-A then A will try to reconnect to B, then
> to
> > C, then to D. This is how discovery works now. I agree this should be
> fixed
> > and I have couple ideas on how we can do it but let's separate these
> ones.
> >
> > >>
> > Okey, then i think Vyacheslav's idea (using latency values) is quite
> enough
> > when we can't determine real physical location.
> > >>
> >
> > Can you please explain why this is better than arc approach?
> >
> > --Yakov
> >
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Vyacheslav Daradur <da...@gmail.com>.

>>
Vyacheslav, I agree that latency increase in the way you describe, but I
still don't understand how we use this information in discovery. Latency
may differ from time to time depending on many factors. I still think that
arc approach is more intuitive for user and easier to implement.
>>

Way of latency increase is just a main idea.

I suggest to connect new node on some priority.
General approach:
--
if [ there are same host node ] then [ connect with it ]
else if [ there are same subnet nodes] then [ connect with one of them ]
 // how to choose node from a set of subnet? - choose with min latency each
other
else [ connect to remote nodes ] // how to choose node from a set of
remotes? - choose with min latency each other
--
Maybe we can describe another intermediate steps.


2016-12-26 15:08 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> >>
> I just want to understand which benefits we get when implement what we're
> talking about. If major benefit is reduced latency of ring messages, then
> the assignment 'ARC ID' in accordance with latency value is quite
> enough. But if there are any hidden problems because of the large number of
> reconnection (like I described in first message in this discussion), then
> better to find a way to determine real physical location.
> >>
>
> I suggest to solve ring building up and reducing number of reconnects
> separately. If we have AxB-C-D-A then A will try to reconnect to B, then to
> C, then to D. This is how discovery works now. I agree this should be fixed
> and I have couple ideas on how we can do it but let's separate these ones.
>
> >>
> Okey, then i think Vyacheslav's idea (using latency values) is quite enough
> when we can't determine real physical location.
> >>
>
> Can you please explain why this is better than arc approach?
>
> --Yakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

>>
I just want to understand which benefits we get when implement what we're
talking about. If major benefit is reduced latency of ring messages, then
the assignment 'ARC ID' in accordance with latency value is quite
enough. But if there are any hidden problems because of the large number of
reconnection (like I described in first message in this discussion), then
better to find a way to determine real physical location.
>>

I suggest to solve ring building up and reducing number of reconnects
separately. If we have AxB-C-D-A then A will try to reconnect to B, then to
C, then to D. This is how discovery works now. I agree this should be fixed
and I have couple ideas on how we can do it but let's separate these ones.

>>
Okey, then i think Vyacheslav's idea (using latency values) is quite enough
when we can't determine real physical location.
>>

Can you please explain why this is better than arc approach?

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

> I am afraid I did not understand this at all. Please elaborate.

I just want to understand which benefits we get when implement what we're
talking about. If major benefit is reduced latency of ring messages, then
the assignment 'ARC ID' in accordance with latency value is quite
enough. But if there are any hidden problems because of the large number of
reconnection (like I described in first message in this discussion), then
better to find a way to determine real physical location.

> And, yes, proper built ring should reduce latency of ring messages IMO.

Okey, then i think Vyacheslav's idea (using latency values) is quite enough
when we can't determine real physical location.

2016-12-26 13:03 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> >Then, as I understand it, a lot of reconnection in the ring cannot create
> even temporary but major problems for performance. And in general this
> optimization will change practically nothing. Or am I missing some things?
>
> I am afraid I did not understand this at all. Please elaborate.
>
> I did not suggest any reconnections or ring rebuild. All I suggest is to
> control over ring building process with arcs. And, yes, proper built ring
> should reduce latency of ring messages IMO.
>
> --Yakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

>Then, as I understand it, a lot of reconnection in the ring cannot create
even temporary but major problems for performance. And in general this
optimization will change practically nothing. Or am I missing some things?

I am afraid I did not understand this at all. Please elaborate.

I did not suggest any reconnections or ring rebuild. All I suggest is to
control over ring building process with arcs. And, yes, proper built ring
should reduce latency of ring messages IMO.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Александр Меньшиков <sh...@gmail.com>.

Thank you, Denis, for your explanation. Then, as I understand it, a lot of
reconnection in the ring cannot create even temporary but major problems
for performance. And in general this optimization will change practically
nothing. Or am I missing some things?

2016-12-26 12:20 GMT+03:00 Yakov Zhdanov <yz...@apache.org>:

> >>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
> >>
>
> Vyacheslav, please elaborate on how we can determine whether we are on the
> same rack. I am not sure this is possible in general case. Please see my
> suggestions below.
>
> >>
> However, here is the concern I have. Currently when a new node joins,
> coordinator assigns order number to this node (e.g. if we already have
> nodes 1,2 and 3, new node will have order 4). This node will then be the
> last one on the ring, i.e. nodes are always ordered in the ring by this
> order number (1->2->3->4->1). If we change this, we will basically allow a
> node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
> sure if this is going to cause issues, but sounds dangerous.
>
> Yakov, can you please chime in and share your thoughts on this?
> >>
>
> I don't think this may cause issues. Nodes ordering and placement is
> implemented in TcpDiscoveryNodesRing and I think that we will just need to
> alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#
> nextNode(java.util.Collection<org.apache.ignite.spi.
> discovery.tcp.internal.TcpDiscoveryNode>)
> logic.
>
> As far as design of this, I would suggest the following.
>
> 1.  User should have an ability to define ARC_ID for the node. I suggest
> "arc" for this since we are using "ring" concept. This will be the most
> honored characteristic for nodes placement. By default arc_id is 0 and
> possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
> via TcpDiscoverySpi.setArcId() - new method.
> So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
> to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
> represent different racks or data centers.
>
> I am strongly against giving user an opportunity to point exact place in
> the ring with somewhat like this interface [int getIdex(Node newNode,
> List<Node> currentRing)]. This is very error prone and may require tricky
> consistency checks just to make sure that implementation of this interface
> is consistent along the topology.
> With "arcs" approach user can automatically assign proper ids basing on
> physical network topology and network routes.
>
> 2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
> placed side by side in the same arc.
>
> 3. Physical host - 3rd honored parameter. Nodes on the same physical host
> should be placed together automatically in the same arc.
>
> 4. New mode involving points 1-3 should become default and we should also
> provide ability to switch to current mode which should become legacy.
>
> --Yakov
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Yakov Zhdanov <yz...@apache.org>.

>>
For example, ordering on latency:
- nodes on one host = 1
- nodes in one rack-blade = 2
- nodes in one server-rack = 3
- nodes in one physical cluster = 4
- nodes in one subnet = 5
- etc.

Maybe it'll be better to use some metrics from ClusterMetrics interface.

The algorithm of ordering can be implemented in a class such as Comparator
and use it when we build a cluster or we select a place for a new node.
>>

Vyacheslav, please elaborate on how we can determine whether we are on the
same rack. I am not sure this is possible in general case. Please see my
suggestions below.

>>
However, here is the concern I have. Currently when a new node joins,
coordinator assigns order number to this node (e.g. if we already have
nodes 1,2 and 3, new node will have order 4). This node will then be the
last one on the ring, i.e. nodes are always ordered in the ring by this
order number (1->2->3->4->1). If we change this, we will basically allow a
node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
sure if this is going to cause issues, but sounds dangerous.

Yakov, can you please chime in and share your thoughts on this?
>>

I don't think this may cause issues. Nodes ordering and placement is
implemented in TcpDiscoveryNodesRing and I think that we will just need to
alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>)
logic.

As far as design of this, I would suggest the following.

1.  User should have an ability to define ARC_ID for the node. I suggest
"arc" for this since we are using "ring" concept. This will be the most
honored characteristic for nodes placement. By default arc_id is 0 and
possible to set with system property IGNITE_DISCO_ARC_ID or env variable or
via TcpDiscoverySpi.setArcId() - new method.
So, if I have nodes A, D, G with arc_id set to 1 and B, Z with arc_id set
to 5 then ring should be built as follows: A->D->G->B->Z->A. Here arcs can
represent different racks or data centers.

I am strongly against giving user an opportunity to point exact place in
the ring with somewhat like this interface [int getIdex(Node newNode,
List<Node> currentRing)]. This is very error prone and may require tricky
consistency checks just to make sure that implementation of this interface
is consistent along the topology.
With "arcs" approach user can automatically assign proper ids basing on
physical network topology and network routes.

2. Subnet - 2nd honored parameter. Nodes on the same subnet should be
placed side by side in the same arc.

3. Physical host - 3rd honored parameter. Nodes on the same physical host
should be placed together automatically in the same arc.

4. New mode involving points 1-3 should become default and we should also
provide ability to switch to current mode which should become legacy.

--Yakov

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Valentin Kulichenko <va...@gmail.com>.

Hi Vyacheslav,

Discovery logic is incapsulated in TcpDiscoverySpi.
TcpDiscoveryMulticastIpFinder in one of many implementations of IP finder.
The only purpose of the IP finder is to provide list of addresses where a
node can send initial join request, and the fact that it sends this initial
request to node A doesn't actually mean that it will be connected to A
within a ring. Having said that, I doubt that IP finder will be somehow
affected in case the discussed change is implemented.

Discovery protocol already maintains consistent information about the ring,
so any node in topology already knows everything about other nodes,
including ordering in the ring. So on discovery level it should not be very
difficult to customize where a joining node is placed on the ring.

However, here is the concern I have. Currently when a new node joins,
coordinator assigns order number to this node (e.g. if we already have
nodes 1,2 and 3, new node will have order 4). This node will then be the
last one on the ring, i.e. nodes are always ordered in the ring by this
order number (1->2->3->4->1). If we change this, we will basically allow a
node to be placed anywhere else (smth like 1->2->4->3->1). I'm not 100%
sure if this is going to cause issues, but sounds dangerous.

Yakov, can you please chime in and share your thoughts on this?

-Val

On Fri, Dec 23, 2016 at 2:46 AM, Vyacheslav Daradur <da...@gmail.com>
wrote:

> Thanks for reply.
>
> I have some questions:
>
> 1. Where the logic of Ignite cluster building is realized? DiscoverySpi and
> TcpDiscoveryMulticastIpFinder?
>
> 2. Which standart Ignite metrics you can recommend to use for
> node-ordering?
>
> 2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:
>
> > I think having some user-defined ordering can be beneficial. However, we
> > are only talking about node discovery protocol here to maintain the
> > cluster. All other communication between nodes happens directly (does not
> > go through the ring).
> >
> > D.
> >
> > On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <daradurvs@gmail.com
> >
> > wrote:
> >
> > > Hello, Alex!
> > >
> > > I think it is a great idea.
> > >
> > > I suggest to build communications between nodes on weight (or
> priority).
> > >
> > > For example, ordering on latency:
> > > - nodes on one host = 1
> > > - nodes in one rack-blade = 2
> > > - nodes in one server-rack = 3
> > > - nodes in one physical cluster = 4
> > > - nodes in one subnet = 5
> > > - etc.
> > >
> > > Maybe it'll be better to use some metrics from ClusterMetrics
> interface.
> > >
> > > The algorithm of ordering can be implemented in a class such as
> > Comparator
> > > and use it when we build a cluster or we select a place for a new node.
> > >
> > > --
> > > With best regards,
> > > Vyacheslav Daradur
> > >
> > > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
> > >
> > > > Hello everyone,
> > > >
> > > > As far as I know nodes are connected in a ring. For example if i
> have 6
> > > > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > > > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node
> > > falls
> > > > out of topology neighboring nodes must reconnect. If nodes A,B and C
> > > > located in the same physical location, and D, E and F in another, and
> > in
> > > > some time one physical location is not available in another, we can
> get
> > > > different number of reconnections. Best case scenario if we have ring
> > > like
> > > > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect
> > (C
> > > > reconnect to A or F reconnect to D -- depending on what part of the
> > > cluster
> > > > we leave alive). But now possible that case AxFxBxExCxDxA -- then we
> > get
> > > a
> > > > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > > > reconnections, where n -- number of nodes). And i think to add
> > something
> > > to
> > > > ensure that we always have good sorting of nodes connections
> > > > (A-B-C-...-Z-A).
> > > >
> > > > Of course in real world we can have multiple levels of physical
> > > closeness.
> > > >
> > > > In my opinion enough to add one parameter of 'int' to configuration
> > (with
> > > > name like 'ExtraNodeOrder') and to change the method of comparison
> > nodes
> > > so
> > > > that it first compared the 'ExtraNodeOrder', and then according to
> the
> > > old
> > > > criterion (as far as I know Ignite use topology version). So if some
> > > users
> > > > have multiple levels of physical closeness, he can use different
> bits.
> > > For
> > > > example use 16 high bits for DC number, and low 16 bits for racks.
> > > >
> > > > Alternatively, we can add array of ‘int’ to configuration and compare
> > > nodes
> > > > in sequence from the zero element to the last.
> > > >
> > >
> >
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Vyacheslav Daradur <da...@gmail.com>.

Thanks for reply.

I have some questions:

1. Where the logic of Ignite cluster building is realized? DiscoverySpi and
TcpDiscoveryMulticastIpFinder?

2. Which standart Ignite metrics you can recommend to use for node-ordering?

2016-12-22 19:08 GMT+03:00 Dmitriy Setrakyan <ds...@apache.org>:

> I think having some user-defined ordering can be beneficial. However, we
> are only talking about node discovery protocol here to maintain the
> cluster. All other communication between nodes happens directly (does not
> go through the ring).
>
> D.
>
> On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <da...@gmail.com>
> wrote:
>
> > Hello, Alex!
> >
> > I think it is a great idea.
> >
> > I suggest to build communications between nodes on weight (or priority).
> >
> > For example, ordering on latency:
> > - nodes on one host = 1
> > - nodes in one rack-blade = 2
> > - nodes in one server-rack = 3
> > - nodes in one physical cluster = 4
> > - nodes in one subnet = 5
> > - etc.
> >
> > Maybe it'll be better to use some metrics from ClusterMetrics interface.
> >
> > The algorithm of ordering can be implemented in a class such as
> Comparator
> > and use it when we build a cluster or we select a place for a new node.
> >
> > --
> > With best regards,
> > Vyacheslav Daradur
> >
> > 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
> >
> > > Hello everyone,
> > >
> > > As far as I know nodes are connected in a ring. For example if i have 6
> > > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node
> > falls
> > > out of topology neighboring nodes must reconnect. If nodes A,B and C
> > > located in the same physical location, and D, E and F in another, and
> in
> > > some time one physical location is not available in another, we can get
> > > different number of reconnections. Best case scenario if we have ring
> > like
> > > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect
> (C
> > > reconnect to A or F reconnect to D -- depending on what part of the
> > cluster
> > > we leave alive). But now possible that case AxFxBxExCxDxA -- then we
> get
> > a
> > > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > > reconnections, where n -- number of nodes). And i think to add
> something
> > to
> > > ensure that we always have good sorting of nodes connections
> > > (A-B-C-...-Z-A).
> > >
> > > Of course in real world we can have multiple levels of physical
> > closeness.
> > >
> > > In my opinion enough to add one parameter of 'int' to configuration
> (with
> > > name like 'ExtraNodeOrder') and to change the method of comparison
> nodes
> > so
> > > that it first compared the 'ExtraNodeOrder', and then according to the
> > old
> > > criterion (as far as I know Ignite use topology version). So if some
> > users
> > > have multiple levels of physical closeness, he can use different bits.
> > For
> > > example use 16 high bits for DC number, and low 16 bits for racks.
> > >
> > > Alternatively, we can add array of ‘int’ to configuration and compare
> > nodes
> > > in sequence from the zero element to the last.
> > >
> >
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Dmitriy Setrakyan <ds...@apache.org>.

I think having some user-defined ordering can be beneficial. However, we
are only talking about node discovery protocol here to maintain the
cluster. All other communication between nodes happens directly (does not
go through the ring).

D.

On Thu, Dec 22, 2016 at 6:32 AM, Vyacheslav Daradur <da...@gmail.com>
wrote:

> Hello, Alex!
>
> I think it is a great idea.
>
> I suggest to build communications between nodes on weight (or priority).
>
> For example, ordering on latency:
> - nodes on one host = 1
> - nodes in one rack-blade = 2
> - nodes in one server-rack = 3
> - nodes in one physical cluster = 4
> - nodes in one subnet = 5
> - etc.
>
> Maybe it'll be better to use some metrics from ClusterMetrics interface.
>
> The algorithm of ordering can be implemented in a class such as Comparator
> and use it when we build a cluster or we select a place for a new node.
>
> --
> With best regards,
> Vyacheslav Daradur
>
> 2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:
>
> > Hello everyone,
> >
> > As far as I know nodes are connected in a ring. For example if i have 6
> > nodes, with names A, B, C, D, E, and F they can connect in ring any
> > possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node
> falls
> > out of topology neighboring nodes must reconnect. If nodes A,B and C
> > located in the same physical location, and D, E and F in another, and in
> > some time one physical location is not available in another, we can get
> > different number of reconnections. Best case scenario if we have ring
> like
> > A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
> > reconnect to A or F reconnect to D -- depending on what part of the
> cluster
> > we leave alive). But now possible that case AxFxBxExCxDxA -- then we get
> a
> > lot of reconnections (A to B, B to C, C to A -- in general n/2
> > reconnections, where n -- number of nodes). And i think to add something
> to
> > ensure that we always have good sorting of nodes connections
> > (A-B-C-...-Z-A).
> >
> > Of course in real world we can have multiple levels of physical
> closeness.
> >
> > In my opinion enough to add one parameter of 'int' to configuration (with
> > name like 'ExtraNodeOrder') and to change the method of comparison nodes
> so
> > that it first compared the 'ExtraNodeOrder', and then according to the
> old
> > criterion (as far as I know Ignite use topology version). So if some
> users
> > have multiple levels of physical closeness, he can use different bits.
> For
> > example use 16 high bits for DC number, and low 16 bits for racks.
> >
> > Alternatively, we can add array of ‘int’ to configuration and compare
> nodes
> > in sequence from the zero element to the last.
> >
>

Re: Sort nodes in the ring in order to minimize the number of reconnections

Posted by Vyacheslav Daradur <da...@gmail.com>.

Hello, Alex!

I think it is a great idea.

I suggest to build communications between nodes on weight (or priority).

For example, ordering on latency:
- nodes on one host = 1
- nodes in one rack-blade = 2
- nodes in one server-rack = 3
- nodes in one physical cluster = 4
- nodes in one subnet = 5
- etc.

Maybe it'll be better to use some metrics from ClusterMetrics interface.

The algorithm of ordering can be implemented in a class such as Comparator
and use it when we build a cluster or we select a place for a new node.

--
With best regards,
Vyacheslav Daradur

2016-12-22 13:59 GMT+03:00 Александр Меньшиков <sh...@gmail.com>:

> Hello everyone,
>
> As far as I know nodes are connected in a ring. For example if i have 6
> nodes, with names A, B, C, D, E, and F they can connect in ring any
> possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc. And if some node falls
> out of topology neighboring nodes must reconnect. If nodes A,B and C
> located in the same physical location, and D, E and F in another, and in
> some time one physical location is not available in another, we can get
> different number of reconnections. Best case scenario if we have ring like
> A-B-CxD-E-FxA ('x' mean disconnect) -- then we get only one reconnect (C
> reconnect to A or F reconnect to D -- depending on what part of the cluster
> we leave alive). But now possible that case AxFxBxExCxDxA -- then we get a
> lot of reconnections (A to B, B to C, C to A -- in general n/2
> reconnections, where n -- number of nodes). And i think to add something to
> ensure that we always have good sorting of nodes connections
> (A-B-C-...-Z-A).
>
> Of course in real world we can have multiple levels of physical closeness.
>
> In my opinion enough to add one parameter of 'int' to configuration (with
> name like 'ExtraNodeOrder') and to change the method of comparison nodes so
> that it first compared the 'ExtraNodeOrder', and then according to the old
> criterion (as far as I know Ignite use topology version). So if some users
> have multiple levels of physical closeness, he can use different bits. For
> example use 16 high bits for DC number, and low 16 bits for racks.
>
> Alternatively, we can add array of ‘int’ to configuration and compare nodes
> in sequence from the zero element to the last.
>