You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2015/02/24 06:22:12 UTC

[jira] [Created] (AMBARI-9761) Performance: Cluster Installation Deadlocks When Setting Component States

Jonathan Hurley created AMBARI-9761:
---------------------------------------

             Summary: Performance: Cluster Installation Deadlocks When Setting Component States
                 Key: AMBARI-9761
                 URL: https://issues.apache.org/jira/browse/AMBARI-9761
             Project: Ambari
          Issue Type: Bug
          Components: ambari-server
    Affects Versions: 2.0.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
            Priority: Critical
             Fix For: 2.0.0
         Attachments: jstack2

During provisioning of a cluster with at least 200 hosts, Ambari Server becomes unresponsive. Based on the thread dump, there exists a deadlock between:

- Cluster readers
- Cluster writers
- ServiceComponentHost writers

{noformat}
qtp626652285-97   ClusterImpl.convertToResponse() (cluster readLock)
qtp1282624353-47  ServiceComponentHostImpl.setRestartRequired() (sch writeLock)
qtp626652285-97   ServiceComponentHostImpl.getMaintenanceState() (sch readLock BLOCKED by qtp1282624353-47)
qtp1282624353-60  ClusterImpl.recalculateClusterVersionState() (cluster writeLock BLOCKED by qtp626652285-97)
qtp1282624353-47  ServiceComponentHostImpl.isPersisted() (cluster readLock BLOCKED by qtp1282624353-47)

"qtp626652285-97" prio=10 tid=0x00007f2e2803a800 nid=0x5a3f waiting on condition [0x00007f2df17cd000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000079ebb1130> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.getMaintenanceState(ServiceComponentHostImpl.java:1437)
	at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:208)
	at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:177)
	at org.apache.ambari.server.controller.MaintenanceStateHelper.getEffectiveState(MaintenanceStateHelper.java:191)
	at org.apache.ambari.server.state.cluster.ClusterImpl.getClusterHealthReport(ClusterImpl.java:2422)
	at org.apache.ambari.server.state.cluster.ClusterImpl.convertToResponse(ClusterImpl.java:1606)

"qtp1282624353-47" prio=10 tid=0x00007f2e08015800 nid=0x59c2 waiting on condition [0x00007f2df37ef000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.isPersisted(ServiceComponentHostImpl.java:1153)
	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.saveIfPersisted(ServiceComponentHostImpl.java:1266)
	at org.apache.ambari.server.state.svccomphost.ServiceComponentHostImpl.setRestartRequired(ServiceComponentHostImpl.java:1480)
	at org.apache.ambari.server.agent.HeartBeatHandler.processCommandReports(HeartBeatHandler.java:546)
	at org.apache.ambari.server.agent.HeartBeatHandler.handleHeartBeat(HeartBeatHandler.java:253)
	at org.apache.ambari.server.agent.rest.AgentResource.heartbeat(AgentResource.java:123)


"qtp1282624353-60" prio=10 tid=0x00007f2dfc014800 nid=0x59cf waiting on condition [0x00007f2df2ae1000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
	at java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:945)
	at org.apache.ambari.server.state.cluster.ClusterImpl.recalculateClusterVersionState(ClusterImpl.java:1180)
	at org.apache.ambari.server.events.listeners.upgrade.StackVersionListener.onAmbariEvent(StackVersionListener.java:81)
	at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74)
	at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314)
	at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
	at com.google.common.eventbus.EventBus.post(EventBus.java:267)

"qtp1282624353-109" prio=10 tid=0x00007f2df8001000 nid=0x5a52 waiting on condition [0x00007f2df2be2000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000079b474368> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
	at org.apache.ambari.server.events.listeners.upgrade.StackVersionListener.onAmbariEvent(StackVersionListener.java:76)
	at sun.reflect.GeneratedMethodAccessor52.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.google.common.eventbus.EventHandler.handleEvent(EventHandler.java:74)
	at com.google.common.eventbus.EventBus.dispatch(EventBus.java:314)
	at com.google.common.eventbus.EventBus.dispatchQueuedEvents(EventBus.java:296)
	at com.google.common.eventbus.EventBus.post(EventBus.java:267)

"qtp626652285-106" prio=10 tid=0x00007f2e28026000 nid=0x5a4a waiting on condition [0x00007f2df3efa000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000079bf800d8> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireShared(AbstractQueuedSynchronizer.java:964)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireShared(AbstractQueuedSynchronizer.java:1282)
	at java.util.concurrent.locks.ReentrantReadWriteLock$ReadLock.lock(ReentrantReadWriteLock.java:731)
	at org.apache.ambari.server.state.cluster.ClusterImpl.convertToResponse(ClusterImpl.java:1602)
	at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getClusters(AmbariManagementControllerImpl.java:861)
	at org.apache.ambari.server.controller.AmbariManagementControllerImpl.getClusters(AmbariManagementControllerImpl.java:2563)
	at org.apache.ambari.server.controller.internal.ClusterResourceProvider$1.invoke(ClusterResourceProvider.java:182)
	at org.apache.ambari.server.controller.internal.ClusterResourceProvider$1.invoke(ClusterResourceProvider.java:179)
	at org.apache.ambari.server.controller.internal.AbstractResourceProvider.getResources(AbstractResourceProvider.java:302)
	at org.apache.ambari.server.controller.internal.ClusterResourceProvider.getResources(ClusterResourceProvider.java:179)
	at org.apache.ambari.server.controller.internal.ClusterControllerImpl$ExtendedResourceProviderWrapper.queryForResources(ClusterControllerImpl.java:945)
	at org.apache.ambari.server.controller.internal.ClusterControllerImpl.getResources(ClusterControllerImpl.java:132)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)