You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Jonathan Hurley <jh...@hortonworks.com> on 2015/01/28 18:38:24 UTC

Review Request 30357: Deadlock Between Dependent Cluster/Service/Component/Host Implementations

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30357/
-----------------------------------------------------------

Review request for Ambari, Alejandro Fernandez, Mahadev Konar, Nate Cole, and Tom Beerbower.


Bugs: AMBARI-9368
    https://issues.apache.org/jira/browse/AMBARI-9368


Repository: ambari


Description
-------

A deadlock scenario was identified between ServiceComponentImpl and ServiceComponentHostImpl. The root cause was the pattern of locking that is used in ClusterImpl, ServiceImpl, ServiceComponentImpl and ServiceComponentHostImpl. Essentially, our pattern of using 2 locks per implementation is, in most cases, unnecessary and deadlock prone. 

ClusterImpl:
I've gone through and removed the internal lock on the cluster. The cluster is the highest level of logical organization and should only have 1 lock (hence the reason for a cluster global lock). It made no sense to keep another internal lock that either locked in parallel or contradicted the global lock. Furthermore, methods on primary keys such as clusterId don't even need to lock since their values won't change.

ServiceImpl:
Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. The clusterGlobalLock was removed from many methods where the JPA entity of the service was being written to; a cluster lock should not be used unless the state of the cluster is changing or being retrieved.

ServiceComponentImpl:
Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. In many cases, the clusterGlobalLock was removed for the same reasons as stated in ServiceImpl.

ServiceComponentHostImpl
Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. In many cases, the clusterGlobalLock was removed for the same reasons as stated in ServiceImpl.

Overall, the global cluster lock was left on methods that alter the topology of the cluster (add/delete of a service/component/host) or when refreshing the internal entity. The cluster global lock, however, once the cluster topology has changed. The internal locks were left in place where there was work being done around a non-thread-safe member or when an internal entity was having non-primary-key data set.


Diffs
-----

  ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentImpl.java a31e42c 
  ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 85e8314 
  ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 0bfaa19 
  ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java dbdef2b 
  ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java PRE-CREATION 

Diff: https://reviews.apache.org/r/30357/diff/


Testing
-------

Installed a small (5 node) and large (20 node) cluster and performed an end-to-end install with all services. In both cases, I had two browsers opened so there were interwoven requests during installations. 

In the small cluster, I forced a yum repository error on install so that I would need to retry the failed install. The cluster deleted and the services were repopulated on the retry.

After both clusters were installed, I execute an Add Service from the browser at the same time as a DELETE service request from a 2nd browser. In both cases, the requests were scheduled correctly and the services were propagated.

Testing also included adding new hosts, and adding new host components from 2 different browers simultaneously. 

New tests were added to cover threads performing read serialization on the impl's concurrently with impl creation and writes. These tests deadlocked before the changes.

[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 29:21 min
[INFO] Finished at: 2015-01-28T01:36:06-05:00
[INFO] Final Memory: 36M/443M
[INFO] ------------------------------------------------------------------------


Thanks,

Jonathan Hurley


Re: Review Request 30357: Deadlock Between Dependent Cluster/Service/Component/Host Implementations

Posted by Tom Beerbower <tb...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30357/#review70048
-----------------------------------------------------------

Ship it!


Wow, major clean up.  Nice!

- Tom Beerbower


On Jan. 28, 2015, 5:38 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30357/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2015, 5:38 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Mahadev Konar, Nate Cole, and Tom Beerbower.
> 
> 
> Bugs: AMBARI-9368
>     https://issues.apache.org/jira/browse/AMBARI-9368
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> A deadlock scenario was identified between ServiceComponentImpl and ServiceComponentHostImpl. The root cause was the pattern of locking that is used in ClusterImpl, ServiceImpl, ServiceComponentImpl and ServiceComponentHostImpl. Essentially, our pattern of using 2 locks per implementation is, in most cases, unnecessary and deadlock prone. 
> 
> ClusterImpl:
> I've gone through and removed the internal lock on the cluster. The cluster is the highest level of logical organization and should only have 1 lock (hence the reason for a cluster global lock). It made no sense to keep another internal lock that either locked in parallel or contradicted the global lock. Furthermore, methods on primary keys such as clusterId don't even need to lock since their values won't change.
> 
> ServiceImpl:
> Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. The clusterGlobalLock was removed from many methods where the JPA entity of the service was being written to; a cluster lock should not be used unless the state of the cluster is changing or being retrieved.
> 
> ServiceComponentImpl:
> Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. In many cases, the clusterGlobalLock was removed for the same reasons as stated in ServiceImpl.
> 
> ServiceComponentHostImpl
> Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. In many cases, the clusterGlobalLock was removed for the same reasons as stated in ServiceImpl.
> 
> Overall, the global cluster lock was left on methods that alter the topology of the cluster (add/delete of a service/component/host) or when refreshing the internal entity. The cluster global lock, however, once the cluster topology has changed. The internal locks were left in place where there was work being done around a non-thread-safe member or when an internal entity was having non-primary-key data set.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentImpl.java a31e42c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 85e8314 
>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 0bfaa19 
>   ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java dbdef2b 
>   ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30357/diff/
> 
> 
> Testing
> -------
> 
> Installed a small (5 node) and large (20 node) cluster and performed an end-to-end install with all services. In both cases, I had two browsers opened so there were interwoven requests during installations. 
> 
> In the small cluster, I forced a yum repository error on install so that I would need to retry the failed install. The cluster deleted and the services were repopulated on the retry.
> 
> After both clusters were installed, I execute an Add Service from the browser at the same time as a DELETE service request from a 2nd browser. In both cases, the requests were scheduled correctly and the services were propagated.
> 
> Testing also included adding new hosts, and adding new host components from 2 different browers simultaneously. 
> 
> New tests were added to cover threads performing read serialization on the impl's concurrently with impl creation and writes. These tests deadlocked before the changes.
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 29:21 min
> [INFO] Finished at: 2015-01-28T01:36:06-05:00
> [INFO] Final Memory: 36M/443M
> [INFO] ------------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>


Re: Review Request 30357: Deadlock Between Dependent Cluster/Service/Component/Host Implementations

Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30357/#review70049
-----------------------------------------------------------

Ship it!


Ship It!

- Nate Cole


On Jan. 28, 2015, 12:38 p.m., Jonathan Hurley wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30357/
> -----------------------------------------------------------
> 
> (Updated Jan. 28, 2015, 12:38 p.m.)
> 
> 
> Review request for Ambari, Alejandro Fernandez, Mahadev Konar, Nate Cole, and Tom Beerbower.
> 
> 
> Bugs: AMBARI-9368
>     https://issues.apache.org/jira/browse/AMBARI-9368
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> A deadlock scenario was identified between ServiceComponentImpl and ServiceComponentHostImpl. The root cause was the pattern of locking that is used in ClusterImpl, ServiceImpl, ServiceComponentImpl and ServiceComponentHostImpl. Essentially, our pattern of using 2 locks per implementation is, in most cases, unnecessary and deadlock prone. 
> 
> ClusterImpl:
> I've gone through and removed the internal lock on the cluster. The cluster is the highest level of logical organization and should only have 1 lock (hence the reason for a cluster global lock). It made no sense to keep another internal lock that either locked in parallel or contradicted the global lock. Furthermore, methods on primary keys such as clusterId don't even need to lock since their values won't change.
> 
> ServiceImpl:
> Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. The clusterGlobalLock was removed from many methods where the JPA entity of the service was being written to; a cluster lock should not be used unless the state of the cluster is changing or being retrieved.
> 
> ServiceComponentImpl:
> Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. In many cases, the clusterGlobalLock was removed for the same reasons as stated in ServiceImpl.
> 
> ServiceComponentHostImpl
> Locks around static data or primary key data such as service names, cluster IDs, etc, should not have locks around them since they will never change. In many cases, the clusterGlobalLock was removed for the same reasons as stated in ServiceImpl.
> 
> Overall, the global cluster lock was left on methods that alter the topology of the cluster (add/delete of a service/component/host) or when refreshing the internal entity. The cluster global lock, however, once the cluster topology has changed. The internal locks were left in place where there was work being done around a non-thread-safe member or when an internal entity was having non-primary-key data set.
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceComponentImpl.java a31e42c 
>   ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 85e8314 
>   ambari-server/src/main/java/org/apache/ambari/server/state/cluster/ClusterImpl.java 0bfaa19 
>   ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java dbdef2b 
>   ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/30357/diff/
> 
> 
> Testing
> -------
> 
> Installed a small (5 node) and large (20 node) cluster and performed an end-to-end install with all services. In both cases, I had two browsers opened so there were interwoven requests during installations. 
> 
> In the small cluster, I forced a yum repository error on install so that I would need to retry the failed install. The cluster deleted and the services were repopulated on the retry.
> 
> After both clusters were installed, I execute an Add Service from the browser at the same time as a DELETE service request from a 2nd browser. In both cases, the requests were scheduled correctly and the services were propagated.
> 
> Testing also included adding new hosts, and adding new host components from 2 different browers simultaneously. 
> 
> New tests were added to cover threads performing read serialization on the impl's concurrently with impl creation and writes. These tests deadlocked before the changes.
> 
> [INFO] ------------------------------------------------------------------------
> [INFO] BUILD SUCCESS
> [INFO] ------------------------------------------------------------------------
> [INFO] Total time: 29:21 min
> [INFO] Finished at: 2015-01-28T01:36:06-05:00
> [INFO] Final Memory: 36M/443M
> [INFO] ------------------------------------------------------------------------
> 
> 
> Thanks,
> 
> Jonathan Hurley
> 
>