You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Alexander Menshikov (JIRA)" <ji...@apache.org> on 2017/05/02 12:00:07 UTC

[jira] [Comment Edited] (IGNITE-4501) Improvement of connection in a cluster of new node

    [ https://issues.apache.org/jira/browse/IGNITE-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992726#comment-15992726 ] 

Alexander Menshikov edited comment on IGNITE-4501 at 5/2/17 12:00 PM:
----------------------------------------------------------------------

[~yzhdanov]
Yakov, I has fixed this problem too, but fount new one. In the GridDhtPartitionTopologyImpl#artitionMap(boolean onlyActive) I get an assertion because node2part.valid() is false. I spent i week in trying to understand what does it mean and how it connected with the discovery ring, but failed.


was (Author: sharpler):
[~yzhdanov]
Yakov, I has fix this problem too, but fount new one. In the GridDhtPartitionTopologyImpl#artitionMap(boolean onlyActive) I get an assertion because node2part.valid() is false. I spent i week in trying to understand what does it mean and how it connected with the discovery ring, but failed.

> Improvement of connection in a cluster of new node
> --------------------------------------------------
>
>                 Key: IGNITE-4501
>                 URL: https://issues.apache.org/jira/browse/IGNITE-4501
>             Project: Ignite
>          Issue Type: Improvement
>          Components: messaging
>    Affects Versions: 1.8
>            Reporter: Vyacheslav Daradur
>            Assignee: Alexander Menshikov
>              Labels: important
>             Fix For: 2.1
>
>
> h3. Main description:
> Cluster nodes connect a ring.
> For example: we have 6 nodes: A, B, C, D, E, F. 
> They can connect a ring in any possible way: A-B-C-D-E-F-A, or A-F-B-E-C-D-A, etc.
> If some node leaves topology, adjacent nodes must reconnect. 
> If nodes A, B, C are in same physical place, nodes D, E, F are in other place, and places lost connect each other, we will have many ways of reconnections.
> At best case, if we had a ring: A-B-CxD-E-FxA ('x' means disconnect) -- then we have only one reconnect (C
> will be connected to A or F will be connected to D -- depends on what part of the cluster was alive.
> Also, if we had a not ring: AxFxBxExCxDxA -- then we have a lot of reconnections (A to B, B to C, C to A -- in general n/2 reconnections, where n -- number of nodes). 
> h3. Approach:
> It is necessary to develop approach of node insertion to the correct place for creation of the correct ring-topology.
> h3. Solutions:
> Main idea is a sorting according to latency.
> * group nodes in arcs on an ARC_ID. (manualy?)
> * implement NodeComparator (nodes on the same host : nodes on the same subnet : other nodes). We will use it when we connect a new node.
> * [dev list thread|http://mail-archives.apache.org/mod_mbox/ignite-dev/201612.mbox/%3CCAN+WSNyWYXSXEBpGErVt72zTgi2pTQzUWLv8JY=Ke83-5-Rh9g@mail.gmail.com%3E]
> Update Dec, 29 Yakov Zhdanov:
> # introduce CLUSTER_REGION_ID node attribute. This can be done by adding public static final constant to TcpDiscoverySpi.
> # Alter org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNodesRing#nextNode(java.util.Collection<org.apache.ignite.spi.discovery.tcp.internal.TcpDiscoveryNode>) to order basing on per node attribute value
> # Node comparison should be stable and consistent. E.g. if CLUSTER_REGION_IDs are equal then we should compare nodes' IDs. This way we have consistent order on all nodes in topology.
> # Also nextNode() has to group nodes on same host and in same subnet. This can be postponed and implemented after we have other points done.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)