You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Shane Kumpf (JIRA)" <ji...@apache.org> on 2016/10/27 19:03:59 UTC
[jira] [Updated] (YARN-4381) Optimize container metrics in
NodeManagerMetrics
[ https://issues.apache.org/jira/browse/YARN-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shane Kumpf updated YARN-4381:
------------------------------
Component/s: metrics
> Optimize container metrics in NodeManagerMetrics
> ------------------------------------------------
>
> Key: YARN-4381
> URL: https://issues.apache.org/jira/browse/YARN-4381
> Project: Hadoop YARN
> Issue Type: Improvement
> Components: metrics, nodemanager
> Affects Versions: 2.7.1
> Reporter: Yiqun Lin
> Assignee: Yiqun Lin
> Labels: oct16-medium
> Attachments: YARN-4381.001.patch, YARN-4381.002.patch, YARN-4381.003.patch
>
>
> Recently, I found a issue on nodemanager metrics.That's {{NodeManagerMetrics#containersLaunched}} is not actually means the container succeed launched times.Because in some time, it will be failed when receiving the killing command or happening container-localizationFailed.This will lead to a failed container.But now,this counter value will be increased in these code whenever the container is started successfully or failed.
> {code}
> Credentials credentials = parseCredentials(launchContext);
> Container container =
> new ContainerImpl(getConfig(), this.dispatcher,
> context.getNMStateStore(), launchContext,
> credentials, metrics, containerTokenIdentifier);
> ApplicationId applicationID =
> containerId.getApplicationAttemptId().getApplicationId();
> if (context.getContainers().putIfAbsent(containerId, container) != null) {
> NMAuditLogger.logFailure(user, AuditConstants.START_CONTAINER,
> "ContainerManagerImpl", "Container already running on this node!",
> applicationID, containerId);
> throw RPCUtil.getRemoteException("Container " + containerIdStr
> + " already is running on this node!!");
> }
> this.readLock.lock();
> try {
> if (!serviceStopped) {
> // Create the application
> Application application =
> new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
> if (null == context.getApplications().putIfAbsent(applicationID,
> application)) {
> LOG.info("Creating a new application reference for app " + applicationID);
> LogAggregationContext logAggregationContext =
> containerTokenIdentifier.getLogAggregationContext();
> Map<ApplicationAccessType, String> appAcls =
> container.getLaunchContext().getApplicationACLs();
> context.getNMStateStore().storeApplication(applicationID,
> buildAppProto(applicationID, user, credentials, appAcls,
> logAggregationContext));
> dispatcher.getEventHandler().handle(
> new ApplicationInitEvent(applicationID, appAcls,
> logAggregationContext));
> }
> this.context.getNMStateStore().storeContainer(containerId, request);
> dispatcher.getEventHandler().handle(
> new ApplicationContainerInitEvent(container));
> this.context.getContainerTokenSecretManager().startContainerSuccessful(
> containerTokenIdentifier);
> NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
> "ContainerManageImpl", applicationID, containerId);
> // TODO launchedContainer misplaced -> doesn't necessarily mean a container
> // launch. A finished Application will not launch containers.
> metrics.launchedContainer();
> metrics.allocateContainer(containerTokenIdentifier.getResource());
> } else {
> throw new YarnException(
> "Container start failed as the NodeManager is " +
> "in the process of shutting down");
> }
> {code}
> In addition, we are lack of localzationFailed metric in container.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org