You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@celix.apache.org by "Gabriele Ricciardi (JIRA)" <ji...@apache.org> on 2016/09/26 12:06:20 UTC

[jira] [Created] (CELIX-375) Topology manager deadlocks when interacts with dependency manager

Gabriele Ricciardi created CELIX-375:
----------------------------------------

             Summary: Topology manager deadlocks when interacts with dependency manager
                 Key: CELIX-375
                 URL: https://issues.apache.org/jira/browse/CELIX-375
             Project: Celix
          Issue Type: Bug
          Components: Remote Service Admin
            Reporter: Gabriele Ricciardi


When interacting with the Dependency Manager, the Topology Manager deadlocks whenever a required dependency is remotely satisfied (i.e. a remote Celix instance exports a service able to satisfy the required dependency). The issue is systematic.

Target configuration includes Dependency Manager, Topology Manager, RSA and Discovery ETCD.

How to reproduce it:

-  Startup a framework (F1) with a component C1 depending on service S1 and exporting a service S2. From DM command, you can see that the component is not started since it misses an S1.
- Startup the framework (F2) with a component C2 exporting the service S1. It starts up fine, and in F1 the component C1 is started since the dependency is satisfied. But from this point, F1 is in deadlock: no more services are detected, and framework_stop gets stuck.

Explanation:
 
- As soon as the component C2 in F2 starts, a new endpoint is created and exposed in ETCD.
- F1 detects this service and correctly imports it. To do so, the topologyManager_addImportedService locks the rsaListLock and starts importing the detected service to the RSAs.
- rsa_importService triggers the service registry, that triggers the service_tracker registered by the DM.
- Dependency Manager recognize that the dependency required by C1 is satisfied, so it starts it.
- While starting C1, all of its services have to be exported. In this case, C1 provides the S2 service, so it has to be exported.
- Export of S2 triggers the TopologyManager, that calls the topologyManager_addExportedService.
- topologyManager_addExportedService locks the rsaListLock to access the RSAs list.
- Since all of this happens in the same thread (DM is linked as a library and doesn’t have its own thread), you can easily see how the topologyManager_addExportedService gets stuck on the rsaListLock, and the topologyManager_addImportedService cannot complete (and release the lock) until all the stacked calls return.

Declaring the dependency from S1 optional mitigates the issue, since when F1 starts up the C1 component doesn’t have to wait for any dependency, so it’s free to export its S2 BEFORE the remote S1 service is imported. This anyway doesn't solve the problem.

A solution would be using a recursive mutex for rsaListLock. Recursive locks allow the same thread to lock multiple times the same mutex, but they prevents it when it’s done by another thread. In other words, the recursive mutex behaves like a normal mutex when accessed by different threads and like an “always-open” lock when accessed by the same thread.
In principle this solution should be also data-safe, since the rsaList won’t be altered by anyone except the thread that holds the lock, and in this specific case the rsaList is accessed only in read mode.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)