You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2017/03/31 16:33:42 UTC
[jira] [Created] (AMBARI-20646) Large Long Running Requests Can
Slow Down the ActionScheduler
Jonathan Hurley created AMBARI-20646:
----------------------------------------
Summary: Large Long Running Requests Can Slow Down the ActionScheduler
Key: AMBARI-20646
URL: https://issues.apache.org/jira/browse/AMBARI-20646
Project: Ambari
Issue Type: Bug
Components: ambari-server
Affects Versions: 2.4.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Priority: Critical
Fix For: 2.5.1
When creating a massive request (a rolling upgrade on a cluster with 1000 nodes), the size of the request seems to slow down the {{ActionScheduler}}. Each command was taking between 1 to 2 minutes to run (even server-side tasks).
The cause of this can be seen in the following two stack traces:
{code:title=ActionSchedulerImpl}
at org.apache.ambari.server.orm.dao.DaoUtils.selectList(DaoUtils.java:60)
at org.apache.ambari.server.orm.dao.HostRoleCommandDAO.findByPKs(HostRoleCommandDAO.java:293)
at org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1.CGLIB$findByPKs$7(<generated>)
at org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1$$FastClassByGuice$$aa975e7f.invoke(<generated>)
at com.google.inject.internal.cglib.proxy.$MethodProxy.invokeSuper(MethodProxy.java:228)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at org.apache.ambari.server.orm.AmbariLocalSessionInterceptor.invoke(AmbariLocalSessionInterceptor.java:53)
at com.google.inject.internal.InterceptorStackCallback$InterceptedMethodInvocation.proceed(InterceptorStackCallback.java:72)
at com.google.inject.internal.InterceptorStackCallback.intercept(InterceptorStackCallback.java:52)
at org.apache.ambari.server.orm.dao.HostRoleCommandDAO$$EnhancerByGuice$$21789cd1.findByPKs(<generated>)
at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
at org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getStagesInProgress(ActionDBAccessorImpl.java:303)
at org.apache.ambari.server.actionmanager.ActionScheduler.doWork(ActionScheduler.java:341)
at org.apache.ambari.server.actionmanager.ActionScheduler.run(ActionScheduler.java:302)
at java.lang.Thread.run(Thread.java:745)
{code}
{code:title=Server Action Executor}
at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:700)
at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getTasks(ActionDBAccessorImpl.java:84)
at org.apache.ambari.server.actionmanager.Stage.<init>(Stage.java:157)
at org.apache.ambari.server.actionmanager.StageFactoryImpl.createExisting(StageFactoryImpl.java:72)
at org.apache.ambari.server.actionmanager.Request.<init>(Request.java:199)
at org.apache.ambari.server.actionmanager.Request$$FastClassByGuice$$9071e03.newInstance(<generated>)
at com.google.inject.internal.cglib.reflect.$FastConstructor.newInstance(FastConstructor.java:40)
at com.google.inject.internal.DefaultConstructionProxyFactory$1.newInstance(DefaultConstructionProxyFactory.java:60)
at com.google.inject.internal.ConstructorInjector.construct(ConstructorInjector.java:85)
at com.google.inject.internal.ConstructorBindingImpl$Factory.get(ConstructorBindingImpl.java:254)
at com.google.inject.internal.InjectorImpl$4$1.call(InjectorImpl.java:978)
at com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1024)
at com.google.inject.internal.InjectorImpl$4.get(InjectorImpl.java:974)
at com.google.inject.assistedinject.FactoryProvider2.invoke(FactoryProvider2.java:632)
at com.sun.proxy.$Proxy26.createExisting(Unknown Source)
at org.apache.ambari.server.actionmanager.ActionDBAccessorImpl.getRequests(ActionDBAccessorImpl.java:784)
at org.apache.ambari.server.serveraction.ServerActionExecutor.cleanRequestShareDataContexts(ServerActionExecutor.java:259)
- locked <0x00007ff0a14083c8> (a java.util.HashMap)
at org.apache.ambari.server.serveraction.ServerActionExecutor.doWork(ServerActionExecutor.java:454)
at org.apache.ambari.server.serveraction.ServerActionExecutor$1.run(ServerActionExecutor.java:160)
at java.lang.Thread.run(Thread.java:745)
{code}
It's clear from these stacks that every {{PENDING}} stage (roughly 15,000) were being loaded into memory every second (and their accompanying task as well). This makes no sense as these methods don't need all stages - just the _next_ stage. This is because all stages are synchronous within a single request.
The proposed solution is to fix the {{StageEntity.findByCommandStatuses}} call so it doesn't return every stage:
{code}
SELECT stage.requestid,
MIN(stage.stageid)
FROM stageentity stage,
hostrolecommandentity hrc
WHERE hrc.status IN :statuses
AND hrc.stageid = stage.stageid
AND hrc.requestid = stage.requestid
GROUP BY stage.requestid
{code}
*Note that this might not appear on trunk due to AMBARI-18868*
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)