You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Scott Chen (JIRA)" <ji...@apache.org> on 2009/12/04 19:49:20 UTC

[jira] Commented: (MAPREDUCE-1265) Include tasktracker name in the task attempt error log

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786066#action_12786066 ] 

Scott Chen commented on MAPREDUCE-1265:
---------------------------------------

I just realized that job id is just part of task attempt id so we can easily obtain that.
So we need to log tasktracker name here only.

So, here is the log after change:
2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1 *on tracker_m01.aaa.com*: Error: java.lang.OutOfMemoryError: Java heap space
2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0 *on tracker_m02.aaa.com*: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!

> Include tasktracker name in the task attempt error log
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-1265
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1265
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.22.0
>            Reporter: Scott Chen
>            Assignee: Scott Chen
>            Priority: Trivial
>             Fix For: 0.22.0
>
>         Attachments: MAPREDUCE-1265-v2.patch, MAPREDUCE-1265.patch
>
>
> When task attempt receive an error, TaskInProgress will log the task attempt id and diagnosis string in the JobTracker log.
> Ex:
> 2009-xx-xx 23:50:45,994 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_r_000009_1: Error: java.lang.OutOfMemoryError: Java heap space
> 2009-xx-xx 22:53:53,146 INFO org.apache.hadoop.mapred.TaskInProgress: Error from attempt_2009xxxx_xxxx_m_000478_0: Task attempt_2009xxxx_xxxx_m_000478_0 failed to report status for 601 seconds. Killing!
> When we want to debug a machine for example, a blacklisted node.
> We have to use the task attempt id to find these information. This is not very convenient. 
> It will be nice if  we can also log the tasktracker which cauces this error.
> This way we can just grep the hostname to quickly find all the relevant error message.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.