You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2012/11/22 03:04:58 UTC

[jira] [Created] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization

Jason Lowe created MAPREDUCE-4818:
-------------------------------------

             Summary: Easier identification of tasks that timeout during localization
                 Key: MAPREDUCE-4818
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mr-am
    Affects Versions: 0.23.3, 2.0.3-alpha
            Reporter: Jason Lowe


When a task is taking too long to localize and is killed by the AM due to task timeout, the job UI/history is not very helpful.  The attempt simply lists a diagnostic stating it was killed due to timeout, but there are no logs for the attempt since it never actually got started.  There are log messages on the NM that show the container never made it past localization by the time it was killed, but users often do not have access to those logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (MAPREDUCE-4818) Easier identification of tasks that timeout during localization

Posted by "Jason Lowe (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-4818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504084#comment-13504084 ] 

Jason Lowe commented on MAPREDUCE-4818:
---------------------------------------

It would help if the AM could query the NM to determine if a container is localizing.  Then the AM could track containers that have never been seen in the RUNNING state.  If TaskAttemptListener times out then we could query the NM to see if the container is still localizing and use a different, configurable timeout for localizing vs. pinging.
                
> Easier identification of tasks that timeout during localization
> ---------------------------------------------------------------
>
>                 Key: MAPREDUCE-4818
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4818
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am
>    Affects Versions: 0.23.3, 2.0.3-alpha
>            Reporter: Jason Lowe
>
> When a task is taking too long to localize and is killed by the AM due to task timeout, the job UI/history is not very helpful.  The attempt simply lists a diagnostic stating it was killed due to timeout, but there are no logs for the attempt since it never actually got started.  There are log messages on the NM that show the container never made it past localization by the time it was killed, but users often do not have access to those logs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira