You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Daryn Sharp (JIRA)" <ji...@apache.org> on 2017/03/29 16:03:41 UTC

[jira] [Reopened] (YARN-3760) Log aggregation failures

     [ https://issues.apache.org/jira/browse/YARN-3760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daryn Sharp reopened YARN-3760:
-------------------------------

Line numbers are from an old release but the error is evident.
{code}
java.lang.IllegalStateException: Cannot close TFile in the middle of key-value insertion.
        at org.apache.hadoop.io.file.tfile.TFile$Writer.close(TFile.java:310)
        at org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat$LogWriter.close(AggregatedLogFormat.java:456)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:326)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:429)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:388)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$2.run(LogAggregationService.java:387)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:722)
{code}

_AggregatedLogFormat.LogWriter_
{code}
    public void close() {
      try {
        this.writer.close();
      } catch (IOException e) {
        LOG.warn("Exception closing writer", e);
      }
      IOUtils.closeStream(fsDataOStream);
    }
{code}
TFile writer's close which may throw {{IllegalStateException}} if the underlying fs data stream failed.  Unfortunately it only catches IOE, so the ISE rips out w/o closing the fsdata stream.

Additionally, the ctor creates the fs data stream then a TFile.Writer w/o a try/catch.  If the TFile.Writer ctor throws an exception, it's impossible to close the stream.

I haven't checked if there are futher issues with closing the writer high in the stack.

> Log aggregation failures 
> -------------------------
>
>                 Key: YARN-3760
>                 URL: https://issues.apache.org/jira/browse/YARN-3760
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.4.0
>            Reporter: Daryn Sharp
>            Priority: Critical
>
> The aggregated log file does not appear to be properly closed when writes fail.  This leaves a lease renewer active in the NM that spams the NN with lease renewals.  If the token is marked not to be cancelled, the renewals appear to continue until the token expires.  If the token is cancelled, the periodic renew spam turns into a flood of failed connections until the lease renewer gives up.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org