You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Robert Joseph Evans (JIRA)" <ji...@apache.org> on 2012/11/09 21:15:13 UTC

[jira] [Commented] (MAPREDUCE-4751) AM stuck in KILL_WAIT for days

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13494279#comment-13494279 ] 

Robert Joseph Evans commented on MAPREDUCE-4751:
------------------------------------------------

I have been doing a quick once over on this, and I have a few comments.

# I think it would be cleaner for KillWaitAttemptKilledTransition to have a constructor that takes a TaskAttemptCompletionEventStatus, instead of having the subclasses set it directly themselves.
# Remove the commented out if statement.
# I am not sure if HashSet is the correct data type for success, failed, etc.  They are likely to be sparse arrays with small amounts of data in them.  Probably not very important, but if there are thousands of tasks it starts to add up.

Over all it looks OK.  I would like to see more tests though.
                
> AM stuck in KILL_WAIT for days
> ------------------------------
>
>                 Key: MAPREDUCE-4751
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4751
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.23.3, 2.0.2-alpha
>            Reporter: Ravi Prakash
>            Assignee: Vinod Kumar Vavilapalli
>         Attachments: MAPREDUCE-4751-20121108.txt, TaskAttemptStateGraph.jpg
>
>
> We found some jobs were stuck in KILL_WAIT for days on end. The RM shows them as RUNNING. When you go to the AM, it shows it in the KILL_WAIT state, and a few maps running. All these maps were scheduled on nodes which are now in the RM's Lost nodes list. The running maps are in the FAIL_CONTAINER_CLEANUP state

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira