You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jayush Luniya (JIRA)" <ji...@apache.org> on 2015/09/10 21:34:46 UTC
[jira] [Commented] (AMBARI-13065) RU: Core Slaves restart schedule
is extremely slow on very large cluster
[ https://issues.apache.org/jira/browse/AMBARI-13065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739435#comment-14739435 ]
Jayush Luniya commented on AMBARI-13065:
----------------------------------------
The slowdown here is with getStagesInProgress(), getAllStages() in ActionDBAccessorImpl.java. Converting a StageEntity to Stage takes ~30ms and so for ~3000 stageEntities it takes ~90 secs. Given that these are IO-bound the for loop can be parallelized. I prototyped a solution and this could be done in ~11secs.
{code}
@Override
public List<Stage> getAllStages(long requestId) {
List<Stage> stages = new ArrayList<Stage>();
for (StageEntity stageEntity : stageDAO.findByRequestId(requestId)) {
stages.add(stageFactory.createExisting(stageEntity));
}
return stages;
}
{code}
> RU: Core Slaves restart schedule is extremely slow on very large cluster
> ------------------------------------------------------------------------
>
> Key: AMBARI-13065
> URL: https://issues.apache.org/jira/browse/AMBARI-13065
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.1.2
> Reporter: Jayush Luniya
> Assignee: Jayush Luniya
> Priority: Blocker
> Fix For: 2.1.2
>
>
> Performed RU on 1200 node cluster and the progress of 'Core Slaves' restarts is extremely slow - In 3 hours it restarted only 22 components (screenshot attached). At this rate it will take weeks for RU to complete.
> It we look into the agent log where RU core-slaves finished, we see that sequential commands are sent 8 minutes apart - which is very slow. The commands themselves execute in under a minute.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)