You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Ximo Guanter (JIRA)" <ji...@apache.org> on 2014/04/15 12:36:14 UTC
[jira] [Updated] (AMBARI-4930) Ambari initialization problems after
upgrade to 1.4.1
[ https://issues.apache.org/jira/browse/AMBARI-4930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ximo Guanter updated AMBARI-4930:
---------------------------------
Attachment: AMBARI-4930-1.patch
It seems like the cluster initialization cannot be run at the same time as the heartbeat monitoring process. Just moving the line that starts the heartbeat monitoring process after cluster initialization seems to solve the problem (see AMBARI-4930-1.patch).
Not that this should be the final fix for this issue, since both the heartbeat monitoring process and cluster initialization should be able to be run at the same time, but the attached patch provides a workaround that unblocks this issue.
My uneducated guess is that the root problem might be in the locking mechanism of ClusterImpl.java, but I don't have a deep enough understanding of the different classes and locks pinpoint the root cause.
> Ambari initialization problems after upgrade to 1.4.1
> -----------------------------------------------------
>
> Key: AMBARI-4930
> URL: https://issues.apache.org/jira/browse/AMBARI-4930
> Project: Ambari
> Issue Type: Bug
> Affects Versions: 1.4.1
> Reporter: Ximo Guanter
> Attachments: AMBARI-4930-1.patch
>
>
> Starting the Ambari Server sometime fails with the following error
> {code}
> 04:44:56,972 INFO [main] Configuration:511 - Web App DIR test /usr/lib/ambari-server/web
> 04:44:56,975 INFO [main] CertificateManager:70 - Initialization of root certificate
> 04:44:56,975 INFO [main] CertificateManager:72 - Certificate exists:true
> 04:44:57,003 INFO [main] AmbariServer:338 - ********* Initializing Clusters **********
> 04:44:57,285 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute02.hi.inet
> 04:44:57,295 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute03.hi.inet
> 04:44:57,296 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute06.hi.inet
> 04:44:57,296 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-compute04.hi.inet
> 04:44:57,297 WARN [Thread-2] HeartbeatMonitor:123 - Heartbeat lost from host andromeda-data99.hi.inet
> 04:44:57,318 ERROR [main] AmbariServer:461 - Failed to run the Ambari Server
> Local Exception Stack:
> Exception [EclipseLink-2004] (Eclipse Persistence Services - 2.4.0.v20120608-r11652): org.eclipse.persistence.exceptions.ConcurrencyException
> Exception Description: A signal was attempted before wait() on ConcurrencyManager. This normally means that an attempt was made to
> commit or rollback a transaction before it was started, or to rollback a transaction twice.
> at org.eclipse.persistence.exceptions.ConcurrencyException.signalAttemptedBeforeWait(ConcurrencyException.java:84)
> at org.eclipse.persistence.internal.helper.ConcurrencyManager.releaseReadLock(ConcurrencyManager.java:489)
> at org.eclipse.persistence.internal.identitymaps.CacheKey.releaseReadLock(CacheKey.java:392)
> at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:1022)
> at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.cloneAndRegisterObject(UnitOfWorkImpl.java:933)
> at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getAndCloneCacheKeyFromParent(UnitOfWorkIdentityMapAccessor.java:193)
> at org.eclipse.persistence.internal.sessions.UnitOfWorkIdentityMapAccessor.getFromIdentityMap(UnitOfWorkIdentityMapAccessor.java:121)
> at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3906)
> at org.eclipse.persistence.internal.sessions.UnitOfWorkImpl.registerExistingObject(UnitOfWorkImpl.java:3861)
> at org.eclipse.persistence.mappings.CollectionMapping.buildElementUnitOfWorkClone(CollectionMapping.java:296)
> at org.eclipse.persistence.mappings.CollectionMapping.buildElementClone(CollectionMapping.java:309)
> at org.eclipse.persistence.internal.queries.ContainerPolicy.addNextValueFromIteratorInto(ContainerPolicy.java:214)
> at org.eclipse.persistence.mappings.CollectionMapping.buildCloneForPartObject(CollectionMapping.java:222)
> at org.eclipse.persistence.internal.indirection.UnitOfWorkQueryValueHolder.buildCloneFor(UnitOfWorkQueryValueHolder.java:56)
> at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiateImpl(UnitOfWorkValueHolder.java:161)
> at org.eclipse.persistence.internal.indirection.UnitOfWorkValueHolder.instantiate(UnitOfWorkValueHolder.java:222)
> at org.eclipse.persistence.internal.indirection.DatabaseValueHolder.getValue(DatabaseValueHolder.java:88)
> at org.eclipse.persistence.indirection.IndirectList.buildDelegate(IndirectList.java:244)
> at org.eclipse.persistence.indirection.IndirectList.getDelegate(IndirectList.java:415)
> at org.eclipse.persistence.indirection.IndirectList.isEmpty(IndirectList.java:490)
> at org.apache.ambari.server.state.ServiceImpl.<init>(ServiceImpl.java:125)
> at org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e.<init>(<generated>)
> at org.apache.ambari.server.state.ServiceImpl$$EnhancerByGuice$$807a405e$$FastClassByGuice$$1c1221ad.newInstance(<generated>)
> at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
> at com.google.inject.internal.ProxyFactory$ProxyConstructor.newInstance(ProxyFactory.java:260)
> at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
> at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
> at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
> at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
> at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
> at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
> at $Proxy12.createExisting(Unknown Source)
> at org.apache.ambari.server.state.cluster.ClusterImpl.loadServices(ClusterImpl.java:218)
> at org.apache.ambari.server.state.cluster.ClusterImpl.debugDump(ClusterImpl.java:808)
> at org.apache.ambari.server.state.cluster.ClustersImpl.debugDump(ClustersImpl.java:566)
> at org.apache.ambari.server.controller.AmbariServer.run(AmbariServer.java:341)
> at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:458)
> {code}
> The issue seems to be related with the amount of data in the {{ambarirca}} database: it reproduces 80-90% of the time we try to start the ambari-server on an environment in which that DB is 1GB+ and it basically never reproduces on environments with a small DB.
> Running the {{VACUUM FULL}} command does not help minimize the problem.
--
This message was sent by Atlassian JIRA
(v6.2#6252)