You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Adam Antal (JIRA)" <ji...@apache.org> on 2019/06/03 12:48:00 UTC
[jira] [Comment Edited] (YARN-9525) IFile format is not working against s3a remote folder

    [ https://issues.apache.org/jira/browse/YARN-9525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854556#comment-16854556 ] 

Adam Antal edited comment on YARN-9525 at 6/3/19 12:47 PM:
-----------------------------------------------------------

Setting the rollover size to 0 still fails because of the following reason:

In {{LogAggregationIndexedFileController$initializeWriterInRolling}} when we initialize the writer:
{code:java}
    // recreate checksum file if needed before aggregate the logs
    if (overwriteCheckSum) {
      final long currentAggregatedLogFileLength = fc
          .getFileStatus(aggregatedLogFile).getLen();
      FSDataOutputStream checksumFileOutputStream = null;
      try {
        checksumFileOutputStream = fc.create(remoteLogCheckSumFile,
            EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE),
            new Options.CreateOpts[] {});
        String fileName = aggregatedLogFile.getName();
        checksumFileOutputStream.writeInt(fileName.length());
        checksumFileOutputStream.write(fileName.getBytes(
            Charset.forName("UTF-8")));
        checksumFileOutputStream.writeLong(
            currentAggregatedLogFileLength);
        checksumFileOutputStream.flush();
      } finally {
        IOUtils.cleanupWithLogger(LOG, checksumFileOutputStream);
      }
{code}

We fail on the getFileStatus, because we want to get the status of the file we just wrote, and against try to catch its length. I wonder if we only do this because of the length - this information can be calculated while writing, and thus there would be no need to query if through {{S3AFileSystem$s3GetFileStatus}}.


was (Author: adam.antal):
Setting the rollover size to 0 still fails because of the following reason:

In {{LogAggregationIndexedFileController$initializeWriterInRolling}} we attempt the following after writing out the part of the log succesfully:
{code:java}
    // recreate checksum file if needed before aggregate the logs
    if (overwriteCheckSum) {
      final long currentAggregatedLogFileLength = fc
          .getFileStatus(aggregatedLogFile).getLen();
      FSDataOutputStream checksumFileOutputStream = null;
      try {
        checksumFileOutputStream = fc.create(remoteLogCheckSumFile,
            EnumSet.of(CreateFlag.CREATE, CreateFlag.OVERWRITE),
            new Options.CreateOpts[] {});
        String fileName = aggregatedLogFile.getName();
        checksumFileOutputStream.writeInt(fileName.length());
        checksumFileOutputStream.write(fileName.getBytes(
            Charset.forName("UTF-8")));
        checksumFileOutputStream.writeLong(
            currentAggregatedLogFileLength);
        checksumFileOutputStream.flush();
      } finally {
        IOUtils.cleanupWithLogger(LOG, checksumFileOutputStream);
      }
{code}

We fail on the getFileStatus, because we want to get the status of the file we just wrote, and against try to catch its length. I wonder if we only do this because of the length - this information can be calculated while writing, and thus there would be no need to query if through {{S3AFileSystem$s3GetFileStatus}}.

> IFile format is not working against s3a remote folder
> -----------------------------------------------------
>
>                 Key: YARN-9525
>                 URL: https://issues.apache.org/jira/browse/YARN-9525
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: log-aggregation
>    Affects Versions: 3.1.2
>            Reporter: Adam Antal
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: IFile-S3A-POC01.patch, YARN-9525-001.patch
>
>
> Using the IndexedFileFormat {{yarn.nodemanager.remote-app-log-dir}} configured to an s3a URI throws the following exception during log aggregation:
> {noformat}
> Cannot create writer for app application_1556199768861_0001. Skip log upload this time. 
> java.io.IOException: java.io.FileNotFoundException: No such file or directory: s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
> 	at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:247)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.uploadLogsForContainers(AppLogAggregatorImpl.java:306)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.doAppLogAggregation(AppLogAggregatorImpl.java:464)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.AppLogAggregatorImpl.run(AppLogAggregatorImpl.java:420)
> 	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService$1.run(LogAggregationService.java:276)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: No such file or directory: s3a://adamantal-log-test/logs/systest/ifile/application_1556199768861_0001/adamantal-3.gce.cloudera.com_8041
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2488)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2382)
> 	at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2321)
> 	at org.apache.hadoop.fs.DelegateToFileSystem.getFileStatus(DelegateToFileSystem.java:128)
> 	at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1244)
> 	at org.apache.hadoop.fs.FileContext$15.next(FileContext.java:1240)
> 	at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)
> 	at org.apache.hadoop.fs.FileContext.getFileStatus(FileContext.java:1246)
> 	at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController$1.run(LogAggregationIndexedFileController.java:228)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
> 	at org.apache.hadoop.yarn.logaggregation.filecontroller.ifile.LogAggregationIndexedFileController.initializeWriter(LogAggregationIndexedFileController.java:195)
> 	... 7 more
> {noformat}
> This stack trace point to {{LogAggregationIndexedFileController$initializeWriter}} where we do the following steps (in a non-rolling log aggregation setup):
> - create FSDataOutputStream
> - writing out a UUID
> - flushing
> - immediately after that we call a GetFileStatus to get the length of the log file (the bytes we just wrote out), and that's where the failures happens: the file is not there yet due to eventual consistency.
> Maybe we can get rid of that, so we can use IFile format against a s3a target.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org