You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ledion bitincka (JIRA)" <ji...@apache.org> on 2013/11/25 07:14:36 UTC

[jira] [Commented] (YARN-1440) Yarn aggregated logs are difficult for external tools to understand

    [ https://issues.apache.org/jira/browse/YARN-1440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831199#comment-13831199 ] 

ledion bitincka commented on YARN-1440:
---------------------------------------

{quote}
Storing the logs in HDFS 1-to-1 as they appear in the container log directories on the nodes would be a lot of files.
{quote}

[~jlowe] - from my understanding the NodeManager creates one TFile for *each* container executed, within which it then encodes and stores all the log files that the container created. For example, for an MR application the TFile would contain stdout, stderr and syslog - usually the first two are of size 0, while syslog contains the app's logs. Therefore, there's no real reduction in the number of files created. How common is it for other YARN apps to have more than one log file?

{quote}
Would it be helpful for YARN to supply a public API that reads the files for you?
{quote}

[~sandyr] - that would be helpful, however simple flat files would be the best api, thus all the tools available for HDFS files would be available for log files too.

> Yarn aggregated logs are difficult for external tools to understand
> -------------------------------------------------------------------
>
>                 Key: YARN-1440
>                 URL: https://issues.apache.org/jira/browse/YARN-1440
>             Project: Hadoop YARN
>          Issue Type: Improvement
>            Reporter: ledion bitincka
>              Labels: log-aggregation, logs, tfile, yarn
>
> The log aggregation feature in Yarn is awesome! However, the file type and format in which the log files are aggregated into (TFile) should either be much simpler or be made pluggable. The current TFile format forces anyone who wants to see the files to either 
> a) use the web UI
> b) use the CLI tools (yarn logs)  or 
> c) write custom code to read the files 
> My suggestion would be to simplify the log collection by collecting and writing the raw log files into a directory structure as follows: 
> {noformat}
> /{log-collection-dir}/{app-id}/{container-id}/{log-file-name} 
> {noformat}
> This way the application developers can (re)use a much wider array of tools to process the logs. 
> For the readers who are not familiar with logs and their format you can find more info the following two blog posts:
> http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/
> http://blogs.splunk.com/2013/11/18/hadoop-2-0-rant/



--
This message was sent by Atlassian JIRA
(v6.1#6144)