You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Ximo Guanter (JIRA)" <ji...@apache.org> on 2014/04/15 12:36:14 UTC
[jira] [Updated] (AMBARI-4930) Ambari initialization problems after upgrade to 1.4.1

     [ https://issues.apache.org/jira/browse/AMBARI-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ximo Guanter updated AMBARI-4930:
---------------------------------

    Attachment: AMBARI-4930-1.patch

It seems like the cluster initialization cannot be run at the same time as the heartbeat monitoring process. Just moving the line that starts the heartbeat monitoring process after cluster initialization seems to solve the problem (see AMBARI-4930-1.patch).

Not that this should be the final fix for this issue, since both the heartbeat monitoring process and cluster initialization should be able to be run at the same time, but the attached patch provides a workaround that unblocks this issue.

My uneducated guess is that the root problem might be in the locking mechanism of ClusterImpl.java, but I don't have a deep enough understanding of the different classes and locks pinpoint the root cause.

> Ambari initialization problems after upgrade to 1.4.1
> -----------------------------------------------------
>
>                 Key: AMBARI-4930
>                 URL: https://issues.apache.org/jira/browse/AMBARI-4930
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 1.4.1
>            Reporter: Ximo Guanter
>         Attachments: AMBARI-4930-1.patch
>
>
> Starting the Ambari Server sometime fails with the following error
> {code}
> 04:44:56,972  INFO [main] Configuration:511 - Web App DIR test /usr/lib/ambari-server/web
> 04:44:56,975  INFO [main] CertificateManager:70 - Initialization of root certificate
> 04:44:56,975  INFO [main] CertificateManager:72 - Certificate exists:true
> 04:44:57,003  INFO [main] AmbariServer:338 - ********* Initializing Clusters **********
> 04:44:57,285  WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute02.hi.inet
> 04:44:57,295  WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute03.hi.inet
> 04:44:57,296  WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute06.hi.inet
> 04:44:57,296  WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute04.hi.inet
> 04:44:57,297  WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-data99.hi.inet
> 04:44:57,318 ERROR [main] AmbariServer:461 - Failed to run the Ambari Server
> Local Exception Stack:
> Exception [EclipseLink-2004] (Eclipse Persistence Services - 2.4.0.v20120608-r11652): org.eclipse.persistence.exceptions.ConcurrencyException
> Exception Description: A signal was attempted before wait() on ConcurrencyManager. This normally means that an attempt was made to
> commit or rollback a transaction before it was started, or to rollback a transaction twice.
>         at org.eclipse.persistence.exceptions.ConcurrencyException.signalAttemptedBeforeWait(ConcurrencyException.java:84)
>         at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:489)
>         at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:392)
>         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1022)
>         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:933)
>         at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:193)
>         at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:121)
>         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3906)
>         at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3861)
>         at org.eclipse.persistence.mappings.CollectionMapping.buildElementUnitOfWorkClone(CollectionMapping.java:296)
>         at org.eclipse.persistence.mappings.CollectionMapping.buildElementClone(CollectionMapping.java:309)
>         at org.eclipse.persistence.internal.queries.ContainerPolicy.addNextValueFromIteratorInto(ContainerPolicy.java:214)
>         at org.eclipse.persistence.mappings.CollectionMapping.buildCloneForPartObject(CollectionMapping.java:222)
>         at org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder.buildCloneFor(UnitOfWorkQueryValueHolder.java:56)
>         at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:161)
>         at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:222)
>         at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:88)
>         at org.eclipse.persistence.indirection.IndirectList.buildDelegate(IndirectList.java:244)
>         at org.eclipse.persistence.indirection.IndirectList.getDelegate(IndirectList.java:415)
>         at org.eclipse.persistence.indirection.IndirectList.isEmpty(IndirectList.java:490)
>         at org.apache.ambari.server.state.ServiceImpl.<init>(ServiceImpl.java:125)
>         at org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e.<init>(<generated>)
>         at org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e$$FastClassByGuice$$1c1221ad.newInstance(<generated>)
>         at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
>         at com.google.inject.internal.ProxyFactory$ProxyConstructor.newInstance(ProxyFactory.java:260)
>         at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
>         at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
>         at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
>         at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
>         at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
>         at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
>         at $Proxy12.createExisting(Unknown Source)
>         at org.apache.ambari.server.state.cluster.ClusterImpl.loadServices(ClusterImpl.java:218)
>         at org.apache.ambari.server.state.cluster.ClusterImpl.debugDump(ClusterImpl.java:808)
>         at org.apache.ambari.server.state.cluster.ClustersImpl.debugDump(ClustersImpl.java:566)
>         at org.apache.ambari.server.controller.AmbariServer.run(AmbariServer.java:341)
>         at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:458)
> {code}
> The issue seems to be related with the amount of data in the {{ambarirca}}  database: it reproduces 80-90% of the time we try to start the ambari-server on an environment in which that DB is 1GB+ and it basically never reproduces on environments with a small DB.
> Running the {{VACUUM FULL}} command does not help minimize the problem.



--
This message was sent by Atlassian JIRA
(v6.2#6252)