You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "stack@archive.org (JIRA)" <ji...@apache.org> on 2007/03/29 00:52:25 UTC

[jira] Updated: (HADOOP-1181) userlogs reader

     [ https://issues.apache.org/jira/browse/HADOOP-1181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

stack@archive.org updated HADOOP-1181:
--------------------------------------

    Attachment: hadoop1181.patch

Attached is a patch that changes TaskLog$Reader so it uses URLs instead of the file system.  It also:

+ Adds a constructor that takes a userlog subdirectory URL.
+ Adds a public getInputStream method that streams over all userlog parts.
+ Makes TaskLog and TaskLog$Reader public rather than default access
+ Adds a main that takes a URL and that then prints to stdout the concatenated logs

I'll not mark this issue as 'patch ready' until others have had a gander.  Would be great if Arun C Murthy could review since he wrote the original.  In particular, it would be nice to know if the calculation of totalLogSize in the TaskLog$Reader#fetchAll method -- around line 384 in r523437 -- is important.  If not, then some near-duplicate code could be replaced with call to the new getInputStream in a version2 of this patch.

> userlogs reader
> ---------------
>
>                 Key: HADOOP-1181
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1181
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: stack@archive.org
>         Attachments: hadoop1181.patch
>
>
> My jobs output lots of logging.  I want to be able to quickly parse the logs across the cluster for anomalies.  org.apache.hadoop.tool.Logalyzer looks promising at first but it does not know how to deal with the userlog format  and it wants to first copy all logs local.  Digging, there does not seem to currently be a reader for hadoop userlog format.  TaskLog$Reader is not generally accessible and it too expects logs to be on the local filesystem (The latter is of little good if I want to run the analysis as a mapreduce job).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.