You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2007/04/11 20:39:32 UTC

[jira] Commented: (HADOOP-1199) want InputFormat for task logs

    [ https://issues.apache.org/jira/browse/HADOOP-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488143 ] 

Doug Cutting commented on HADOOP-1199:
--------------------------------------

> The number of splits is equal to the number of configured maptasks (Do folks have better ideas regards how to do the split?

I was thinking that the InputFormat would read a config parameter to get a jobId, then use JobClient to query the jobtracker and get the URL for the task log of each task in that job, and package these URLs into the splits.  The 'getLocations()' implementation for these splits would return the hostname of the URL, so that attempts would be made to run the task on the host where the log resides.  Does that make sense?

> want InputFormat for task logs
> ------------------------------
>
>                 Key: HADOOP-1199
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1199
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Doug Cutting
>         Attachments: hadoop1199-v2.patch, hadoop1199.patch
>
>
> We should provide an InputFormat implementation that includes all the task logs from a job. Folks should be able to do something like:
> job = new JobConf();
> job.setInputFormatClass(TaskLogInputFormat.class);
> TaskLogInputFormat.setJobId(jobId);
> ...
> Tasks should ideally be localized to the node that each log is on.
> Examining logs should be as lightweight as possible, to facilitate debugging. It should not require a copy to HDFS. A faster debug loop is like a faster search engine: it makes people more productive. The sooner one can find that, e.g., most tasks failed with a NullPointerException on line 723, the better. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.