You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Junping Du (JIRA)" <ji...@apache.org> on 2016/04/21 21:20:25 UTC

[jira] [Commented] (YARN-4984) LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.

    [ https://issues.apache.org/jira/browse/YARN-4984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15252490#comment-15252490 ] 

Junping Du commented on YARN-4984:
----------------------------------

The exception swallowing happens at LogAggregationService.initAppAggregator()

{noformat}
  // wait until check for existing aggregator to create dirs
    YarnRuntimeException appDirException = null;
    try {
      // Create the app dir
      createAppDir(user, appId, userUgi);
    } catch (Exception e) {
      appLogAggregator.disableLogAggregation();
      if (!(e instanceof YarnRuntimeException)) {
        appDirException = new YarnRuntimeException(e);
      } else {
        appDirException = (YarnRuntimeException)e;
      }
    }
... 
// creating aggregator thread
{noformat}
We should throw out exception in case createAppDir() is created with failure.

> LogAggregationService shouldn't swallow exception in handling createAppDir() which cause thread leak.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-4984
>                 URL: https://issues.apache.org/jira/browse/YARN-4984
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: log-aggregation
>    Affects Versions: 2.7.2
>            Reporter: Junping Du
>            Assignee: Junping Du
>            Priority: Critical
>
> Due to YARN-4325, many stale applications still exists in NM state store and get recovered after NM restart. The app initiation will get failed due to token invalid, but exception is swallowed and aggregator thread is still created for invalid app.
> Exception is:
> {noformat}
> 158 2016-04-19 23:38:33,039 ERROR logaggregation.LogAggregationService (LogAggregationService.java:run(300)) - Failed to setup application log directory for application_1448        060878692_11842
>     159 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 1380589 for hdfswrite) can't be fo        und in cache
>     160         at org.apache.hadoop.ipc.Client.call(Client.java:1427)
>     161         at org.apache.hadoop.ipc.Client.call(Client.java:1358)
>     162         at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>     163         at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
>     164         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771)
>     165         at sun.reflect.GeneratedMethodAccessor76.invoke(Unknown Source)
>     166         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     167         at java.lang.reflect.Method.invoke(Method.java:606)
>     168         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
>     169         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
>     170         at com.sun.proxy.$Proxy14.getFileInfo(Unknown Source)
>     171         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116)
>     172         at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1315)
>     173         at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1311)
>     174         at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     175         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1311)
>     176         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.checkExists(LogAggregationService.java:248)
>     177         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.access$100(LogAggregationService.java:67)
>     178         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
>     179         at java.security.AccessController.doPrivileged(Native Method)
>     180         at javax.security.auth.Subject.doAs(Subject.java:415)
>     181         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>     182         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:261)
>     183         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initAppAggregator(LogAggregationService.java:367)
>     184         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.initApp(LogAggregationService.java:320)
>     185         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:447)
>     186         at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.handle(LogAggregationService.java:67)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)