You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2006/06/08 12:41:29 UTC

[jira] Created: (HADOOP-291) Hadoop Log Archiver/Analyzer utility

Hadoop Log Archiver/Analyzer utility
------------------------------------

Key: HADOOP-291
URL: http://issues.apache.org/jira/browse/HADOOP-291
Project: Hadoop
Type: New Feature

Components: util
Reporter: Arun C Murthy

Overview of the log archiver/analyzer utility...

1. Input
The tool takes as input a list of directory URLs, each url could also we associated with a file-pattern to specify what pattern of files in that directory are to be used.
e.g. http://g1015:50030/logs/hadoop-sameer-jobtracker-*
file:///export/crawlspace/sanjay/hadoop/trunk/run/logs/haddop-sanjay-namenode-* (local disk on the machine on which the job was submitted)

2. The tool supports 2 main functions:

a) Archival
Archive the logs in the DFS in the following hierarchy:
/users/<username>/log-archive/YYYY/mm/dd/HHMMSS.log by default
Or a user-specified directory and then:
<input-dir>/YYYY/mm/dd/HHMMSS.log

b) Processing with simple sort/grep primitives
Archive the logs as above and then grep for lines with given pattern (e.g. INFO) and then sort with spec e.g. <logger><level><date>. (Note: This is proposed with current log4j based logging in mind... do we need anything more generic?). The sort/grep specs are user-provided; along with directory URLs.

3. Thoughts on implementation...

a) Archival
Current idea is to put a .jsp page (src/webapps) on each of the nodes; which then does a *copyFromLocal* of the log-file into the DFS. The jobtracker will fire n map-tasks which only hit the jsp page as per the directory URLs. The reduce-task is a no-op and only collects statistics on failures (if any).

b) Processing with sort/grep
Here, the tool first archives the files as above and then another set of map-reduce tasks will do the sort/grep on the files in DFS with given specs.

- * - * -

Suggestions/corrections welcome...

thanks,
Arun

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-291) Hadoop Log Archiver/Analyzer utility

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved HADOOP-291.
----------------------------------

    Resolution: Duplicate

Duplicate of HADOOP-342

> Hadoop Log Archiver/Analyzer utility
> ------------------------------------
>
>                 Key: HADOOP-291
>                 URL: https://issues.apache.org/jira/browse/HADOOP-291
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: util
>            Reporter: Arun C Murthy
>
> Overview of the log archiver/analyzer utility...
> 1. Input
>   The tool takes as input a list of directory URLs, each url could also we associated with a file-pattern to specify what pattern of files in that directory are to be used.
>   e.g. http://g1015:50030/logs/hadoop-sameer-jobtracker-*
>          file:///export/crawlspace/sanjay/hadoop/trunk/run/logs/haddop-sanjay-namenode-* (local disk on the machine on which the job was submitted)
> 2. The tool supports 2 main functions:
>   a) Archival
>     Archive the logs in the DFS in the following hierarchy:
>    /users/<username>/log-archive/YYYY/mm/dd/HHMMSS.log by default 
>    Or a user-specified directory and then: 
>    <input-dir>/YYYY/mm/dd/HHMMSS.log
>   b) Processing with simple sort/grep primitives
>     Archive the logs as above and then grep for lines with given pattern (e.g. INFO) and then sort with spec e.g. <logger><level><date>. (Note: This is proposed with current log4j based logging in mind... do we need anything more generic?). The sort/grep specs are user-provided; along with directory URLs.
> 3. Thoughts on implementation...
>   a) Archival
>     Current idea is to put a .jsp page (src/webapps) on each of the nodes; which then does a *copyFromLocal* of the log-file into the DFS. The jobtracker will fire n map-tasks which only hit the jsp page as per the directory URLs. The reduce-task is a no-op and only collects statistics on failures (if any).
>   b) Processing with sort/grep
>     Here, the tool first archives the files as above and then another set of map-reduce tasks will do the sort/grep on the files in DFS with given specs.
>                                                                                           - * - * - 
>  Suggestions/corrections welcome...
> thanks,
> Arun

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-291) Hadoop Log Archiver/Analyzer utility

Posted by "Sameer Paranjpye (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/HADOOP-291?page=all ]

Sameer Paranjpye updated HADOOP-291:
------------------------------------

    Fix Version/s: 0.6.0

> Hadoop Log Archiver/Analyzer utility
> ------------------------------------
>
>                 Key: HADOOP-291
>                 URL: http://issues.apache.org/jira/browse/HADOOP-291
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: util
>            Reporter: Arun C Murthy
>             Fix For: 0.6.0
>
>
> Overview of the log archiver/analyzer utility...
> 1. Input
>   The tool takes as input a list of directory URLs, each url could also we associated with a file-pattern to specify what pattern of files in that directory are to be used.
>   e.g. http://g1015:50030/logs/hadoop-sameer-jobtracker-*
>          file:///export/crawlspace/sanjay/hadoop/trunk/run/logs/haddop-sanjay-namenode-* (local disk on the machine on which the job was submitted)
> 2. The tool supports 2 main functions:
>   a) Archival
>     Archive the logs in the DFS in the following hierarchy:
>    /users/<username>/log-archive/YYYY/mm/dd/HHMMSS.log by default 
>    Or a user-specified directory and then: 
>    <input-dir>/YYYY/mm/dd/HHMMSS.log
>   b) Processing with simple sort/grep primitives
>     Archive the logs as above and then grep for lines with given pattern (e.g. INFO) and then sort with spec e.g. <logger><level><date>. (Note: This is proposed with current log4j based logging in mind... do we need anything more generic?). The sort/grep specs are user-provided; along with directory URLs.
> 3. Thoughts on implementation...
>   a) Archival
>     Current idea is to put a .jsp page (src/webapps) on each of the nodes; which then does a *copyFromLocal* of the log-file into the DFS. The jobtracker will fire n map-tasks which only hit the jsp page as per the directory URLs. The reduce-task is a no-op and only collects statistics on failures (if any).
>   b) Processing with sort/grep
>     Here, the tool first archives the files as above and then another set of map-reduce tasks will do the sort/grep on the files in DFS with given specs.
>                                                                                           - * - * - 
>  Suggestions/corrections welcome...
> thanks,
> Arun

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira