You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Jonathan Hurley <jh...@hortonworks.com> on 2015/02/24 06:36:09 UTC
Review Request 31341: Performance: Cluster Installation Deadlocks When
Setting Component States
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/
-----------------------------------------------------------
Review request for Ambari, Nate Cole and Tom Beerbower.
Bugs: AMBARI-9761
https://issues.apache.org/jira/browse/AMBARI-9761
Repository: ambari
Description
-------
Another case of misunderstanding how locks work.
During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive. Based on the thread dump, there exists a deadlock between:
- Cluster readers
- Cluster writers
- ServiceComponentHost writers
qtp626652285-97 ClusterImpl.convertToResponse() (cluster readLock)
qtp1282624353-47 ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
qtp626652285-97 ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by qtp1282624353-47)
qtp1282624353-60 ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED by qtp626652285-97)
qtp1282624353-47 ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-60)
The underlying problem is that a writeLock.lock() is parked which causes all subsequent readLock.lock() requests to also park. This includes the request from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original cluster readLock reader which blocks the cluster write)
Long story short is that I think we need to revisit locks again after 2.0.0; I just don't see a need for locking on reads in most places - that's what the database is doing for us.
Diffs
-----
ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java 117526c
ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea
ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java c43044c
ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java 96a1443
Diff: https://reviews.apache.org/r/31341/diff/
Testing
-------
Reproduced the deadlock in a unit test first, and then verified the deadlock does not occur anymore in the test after applying the patch.
Thanks,
Jonathan Hurley
Re: Review Request 31341: Performance: Cluster Installation Deadlocks
When Setting Component States
Posted by Jonathan Hurley <jh...@hortonworks.com>.
> On Feb. 24, 2015, 7:23 a.m., Nate Cole wrote:
> > ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java, lines 413-417
> > <https://reviews.apache.org/r/31341/diff/1/?file=873469#file873469line413>
> >
> > Do we even need a read lock around the simple boolean check? There is a write lock around the persist() method which seems like the only place where it's required.
Thanks for the review! I was debating the same issue. I've left the locks in place that involve non-thread-safe members. In this case though, you're right that the `persist()` method is enough. Furthermore, this member is only ever set in the event that the service is brand new.
Overall, we really need to remove the locks from many of the read methods. I'm seriously debating removing the read locks around `convertToResponse()` but felt that this close to release, that's risky.
- Jonathan
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73801
-----------------------------------------------------------
On Feb. 24, 2015, 12:36 a.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
>
> (Updated Feb. 24, 2015, 12:36 a.m.)
>
>
> Review request for Ambari, Nate Cole and Tom Beerbower.
>
>
> Bugs: AMBARI-9761
> https://issues.apache.org/jira/browse/AMBARI-9761
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Another case of misunderstanding how locks work.
>
> During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive. Based on the thread dump, there exists a deadlock between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
>
> qtp626652285-97 ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47 ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
> qtp626652285-97 ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by qtp1282624353-47)
> qtp1282624353-60 ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED by qtp626652285-97)
> qtp1282624353-47 ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-60)
>
> The underlying problem is that a writeLock.lock() is parked which causes all subsequent readLock.lock() requests to also park. This includes the request from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original cluster readLock reader which blocks the cluster write)
>
> Long story short is that I think we need to revisit locks again after 2.0.0; I just don't see a need for locking on reads in most places - that's what the database is doing for us.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java 117526c
> ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java c43044c
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java 96a1443
>
> Diff: https://reviews.apache.org/r/31341/diff/
>
>
> Testing
> -------
>
> Reproduced the deadlock in a unit test first, and then verified the deadlock does not occur anymore in the test after applying the patch.
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 31341: Performance: Cluster Installation Deadlocks
When Setting Component States
Posted by Nate Cole <nc...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73801
-----------------------------------------------------------
Ship it!
ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java
<https://reviews.apache.org/r/31341/#comment120214>
Do we even need a read lock around the simple boolean check? There is a write lock around the persist() method which seems like the only place where it's required.
- Nate Cole
On Feb. 24, 2015, 12:36 a.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
>
> (Updated Feb. 24, 2015, 12:36 a.m.)
>
>
> Review request for Ambari, Nate Cole and Tom Beerbower.
>
>
> Bugs: AMBARI-9761
> https://issues.apache.org/jira/browse/AMBARI-9761
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Another case of misunderstanding how locks work.
>
> During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive. Based on the thread dump, there exists a deadlock between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
>
> qtp626652285-97 ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47 ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
> qtp626652285-97 ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by qtp1282624353-47)
> qtp1282624353-60 ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED by qtp626652285-97)
> qtp1282624353-47 ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-60)
>
> The underlying problem is that a writeLock.lock() is parked which causes all subsequent readLock.lock() requests to also park. This includes the request from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original cluster readLock reader which blocks the cluster write)
>
> Long story short is that I think we need to revisit locks again after 2.0.0; I just don't see a need for locking on reads in most places - that's what the database is doing for us.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java 117526c
> ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java c43044c
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java 96a1443
>
> Diff: https://reviews.apache.org/r/31341/diff/
>
>
> Testing
> -------
>
> Reproduced the deadlock in a unit test first, and then verified the deadlock does not occur anymore in the test after applying the patch.
>
>
> Thanks,
>
> Jonathan Hurley
>
>
Re: Review Request 31341: Performance: Cluster Installation Deadlocks
When Setting Component States
Posted by Tom Beerbower <tb...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/31341/#review73795
-----------------------------------------------------------
Ship it!
Ship It!
- Tom Beerbower
On Feb. 24, 2015, 5:36 a.m., Jonathan Hurley wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/31341/
> -----------------------------------------------------------
>
> (Updated Feb. 24, 2015, 5:36 a.m.)
>
>
> Review request for Ambari, Nate Cole and Tom Beerbower.
>
>
> Bugs: AMBARI-9761
> https://issues.apache.org/jira/browse/AMBARI-9761
>
>
> Repository: ambari
>
>
> Description
> -------
>
> Another case of misunderstanding how locks work.
>
> During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive. Based on the thread dump, there exists a deadlock between:
> - Cluster readers
> - Cluster writers
> - ServiceComponentHost writers
>
> qtp626652285-97 ClusterImpl.convertToResponse() (cluster readLock)
> qtp1282624353-47 ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
> qtp626652285-97 ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by qtp1282624353-47)
> qtp1282624353-60 ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED by qtp626652285-97)
> qtp1282624353-47 ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-60)
>
> The underlying problem is that a writeLock.lock() is parked which causes all subsequent readLock.lock() requests to also park. This includes the request from qtp1282624353-47 which is holding a writeLock on the SCH which, in turn, is blocking qtp626652285-97 (the original cluster readLock reader which blocks the cluster write)
>
> Long story short is that I think we need to revisit locks again after 2.0.0; I just don't see a need for locking on reads in most places - that's what the database is doing for us.
>
>
> Diffs
> -----
>
> ambari-server/src/main/java/org/apache/ambari/server/events/listeners/upgrade/StackVersionListener.java 117526c
> ambari-server/src/main/java/org/apache/ambari/server/state/ServiceImpl.java 0de62ea
> ambari-server/src/main/java/org/apache/ambari/server/state/svccomphost/ServiceComponentHostImpl.java c43044c
> ambari-server/src/test/java/org/apache/ambari/server/state/cluster/ClusterDeadlockTest.java 96a1443
>
> Diff: https://reviews.apache.org/r/31341/diff/
>
>
> Testing
> -------
>
> Reproduced the deadlock in a unit test first, and then verified the deadlock does not occur anymore in the test after applying the patch.
>
>
> Thanks,
>
> Jonathan Hurley
>
>