You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2015/01/26 10:08:34 UTC

[jira] [Resolved] (SLING-3726) Topology contains duplicated instances

     [ https://issues.apache.org/jira/browse/SLING-3726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Egli resolved SLING-3726.
--------------------------------
    Resolution: Fixed

While there is still the possibility - from a code point of view - that you can end up with multiple instances reported in the topology (see TopologyViewImpl.addInstances' comments, more details below), the underlying problem is stale announcements - which has been fixed now with SLING-4139. So the fix for SLING-3726 is SLING-4139.

Re TopologyViewImpl.addInstances: consider the following scenario:
 * instance 1 has an announcement from instance 3 containing only itself
 * instance 2 (same cluster as 1) also has an announcement from instance 3, but this time instance 3 reports instance3 and instance 4.

One of the above two announcements must be outdated as both can't be correct.

Nevertheless, when the topology is composed (DiscoveryServiceImpl.getTopology()) the algorithm (might) first add(s) instance 1's announcement containing only instance 3, things are all fine. Then it adds instance 2's announcement of 3. At that time it notes that 3 is already in the list, so refuses to add 3. But it goes on to add 4 to the list (which is not in the list yet, so that's fine). But adding 4 to the list has the side-effect of bringing instance 3 into the topology a 2nd time - since 4 has a link to the cluster containing it - and that cluster also contains 3.

After further analysis though it became clear that above algorithm doesn't need to be changed - and that the problem is rather stale announcements - which is fixed as mentioned in SLING-4139.

Hence considering SLING-3726 fixed as well.

> Topology contains duplicated instances
> --------------------------------------
>
>                 Key: SLING-3726
>                 URL: https://issues.apache.org/jira/browse/SLING-3726
>             Project: Sling
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: Discovery Impl 1.0.4
>            Reporter: Timothee Maret
>            Assignee: Stefan Egli
>              Labels: discovery
>             Fix For: Discovery Impl 1.0.14
>
>
> In our setup, we experience duplicated instances reported in the topology.
> The duplicated instance is reported in two different clusters.
> One of the duplicated instance contains no properties (when accessing via the Discovery APIs).
> This block us from relying on the properties announced by the instances.
> Our setup is composed of a set of CRX active/passive clusters as in the diag. below
> {noformat}
>                -> ELB -> CRX active/passive cluster
>               |
> Dispatcher -> |-> ELB -> CRX active/passive cluster
>               .
>               .
>               .
>               |
>                -> ELB -> CRX active/passive cluster
> {noformat}
> The discovery service is configured to create a star topology, connecting all instances to a central instance.
> All clusters run the same code which embeds org.apache.sling.discovery.impl 1.0.8
> The issue may have been introduced in org.apache.sling.discovery.impl 1.0.4 since we did not experience it with previous releases.
> In one occurence of the issue, the duplicated instance identifier was: 10b323d0-b59e-4f87-8370-a15aab1bdc24
> The server logs contains the trace [0]
> we noticed that all clusters contained the structure [1] which seems to be the cause of the duplicate.
> The workaround consisting of removing [1] from the repository of all instances removed the duplicated instance from the topology. 
> We checked that all instances in the topology have a unique sling identifiers (looking in sling.id.file)
> We also checked that the structure [1] was not created by a mechanism external to the Sling discovery (e.g. content package or initial content) 
> [0] (IP, path and properties are edited)
> {noformat}
> 21.05.2014 07:43:06.756 *INFO* [192.168.0.1 [1400658186712] POST /some/service.json HTTP/1.1] org.apache.sling.discovery.impl.topology.TopologyViewImpl addInstance: cannot add same instance twice: an InstanceDescription[slindId=10b323d0-b59e-4f87-8370-a15aab1bdc24, isLeader=false, isOwn=false, clusterViewId=e5df113c-03a8-48bb-9fee-63cf2a8a6ab3, properties={ ... }]
> {noformat}
> [1] /var/discovery/impl/clusterInstances/10b323d0-b59e-4f87-8370-a15aab1bdc24



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)