You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Binglin Chang (JIRA)" <ji...@apache.org> on 2013/09/26 05:44:06 UTC

[jira] [Updated] (MAPREDUCE-5381) Support graceful decommission of tasktracker

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Binglin Chang updated MAPREDUCE-5381:
-------------------------------------

    Attachment: MAPREDUCE-5381-graceful-decomm.v1.patch

Attach demo patch for graceful decommission TaskTracker, changes:
1. Add mradmin cmd: -decommission <host> to gracefully decommission all TaskTrackers running on host, this command's effect is: TaskTracker's slot capacity will be change  to 0, so it will not accept new tasks, then it will wait all jobs running on this TaskTracker to finish, then stop.
2. this patch depends on MAPREDUCE-4900, it add a new API(decommission) to DynamicResourceProtocol and JobTrackerJMXBean.
3. test will be included after we agree on interface and implementation.

Approach:
Add a new field: runningJobs in TaskTrackerStatus, which is included in TaskTracker heartbeat. When a decommission command is invoked, JobTracker change related TaskTrackers' slot capacity to 0 first, and then wait their runningJobs counter become 0, which means all jobs running on those TaskTracker are finished and cleaned up. JobTracker then shutdown TaskTracker by rejecting TaskTracker heartbeat. 
                
> Support graceful decommission of tasktracker
> --------------------------------------------
>
>                 Key: MAPREDUCE-5381
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5381
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>    Affects Versions: 1.2.0
>            Reporter: Luke Lu
>            Assignee: Binglin Chang
>         Attachments: MAPREDUCE-5381-graceful-decomm.v1.patch
>
>
> When TTs are decommissioned for non-fault reasons (capacity change etc.), it's desirable to minimize the impact to running jobs.
> Currently if a TT is decommissioned, all running tasks on the TT need to be rescheduled on other TTs. Further more, for finished map tasks, if their map output are not fetched by the reducers of the job, these map tasks will need to be rerun as well.
> We propose to introduce a mechanism to optionally gracefully decommission a tasktracker.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira