You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2016/04/21 21:20:25 UTC
[jira] [Commented] (YARN-4984) LogAggregationService shouldn't
swallow exception in handling createAppDir() which cause thread leak.
[ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252490#comment-15252490 ]
Junping Du commented on YARN-4984:
----------------------------------
The exception swallowing happens at LogAggregationService.initAppAggregator()
{noformat}
// wait until check for existing aggregator to create dirs
YarnRuntimeException appDirException = null;
try {
// Create the app dir
createAppDir(user, appId, userUgi);
} catch (Exception e) {
appLogAggregator.disableLogAggregation();
if (!(e instanceof YarnRuntimeException)) {
appDirException = new YarnRuntimeException(e);
} else {
appDirException = (YarnRuntimeException)e;
}
}
...
// creating aggregator thread
{noformat}
We should throw out exception in case createAppDir() is created with failure.
> LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
> -----------------------------------------------------------------------------------------------------
>
> Key: YARN-4984
> URL: https://issues.apache.org/jira/browse/YARN-4984
> Project: Hadoop YARN
> Issue Type: Bug
> Components: log-aggregation
> Affects Versions: 2.7.2
> Reporter: Junping Du
> Assignee: Junping Du
> Priority: Critical
>
> Due to YARN-4325, many stale applications still exists in NM state store and get recovered after NM restart. The app initiation will get failed due to token invalid, but exception is swallowed and aggregator thread is still created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService (LogAggregationService.java:run(300)) - Failed to setup application log directory for application_1448 060878692_11842
> 159 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo und in cache
> 160 at org.apache.hadoop.ipc.Client.call(Client.java:1427)
> 161 at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 162 at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 163 at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
> 164 at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
> 165 at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)
> 166 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 167 at java.lang.reflect.Method.invoke(Method.java:606)
> 168 at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
> 169 at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
> 170 at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
> 171 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
> 172 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
> 173 at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
> 174 at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> 175 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
> 176 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
> 177 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
> 178 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 179 at java.security.AccessController.doPrivileged(Native Method)
> 180 at javax.security.auth.Subject.doAs(Subject.java:415)
> 181 at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 182 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
> 183 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
> 184 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
> 185 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
> 186 at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)