You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Dino Kečo <di...@gmail.com> on 2011/12/12 14:50:58 UTC

Pause and Resume Hadoop map reduce job

Hi Hadoop users,

In my company we have been using Hadoop for 2 years and we have need to
pause and resume map reduce jobs. I was searching on Hadoop JIRA and there
are couple of tickets which are not resolved. So we have implemented our
solution. I would like to share this approach with you and to hear your
opinion about it.

We have created one special pool in fair scheduler called PAUSE
(maxMapTasks = 0, maxReduceTasks = 0). Our logic for pausing job is to move
it into this pool and kill all running tasks. When we want to resume job we
move this job into some other pool. Currently we can do maintenance of
cloud except Job Tracker while jobs are paused. Also we have some external
services which we use and we are doing their maintenance while jobs are
paused.

We know that records which are processed by running tasks will be
reprocessed. In some cases we use same HBase table as input and output and
we save job id on record. When record is re-processes we check this job id
and skip record if it is processed by same job.

Our custom implementation of fair scheduler have this logic implemented and
it is deployed to our cluster.

Please share your comments and concerns about this approach

Regards,
dino

Re: Pause and Resume Hadoop map reduce job

Posted by Arun C Murthy <ac...@hortonworks.com>.
The CapacityScheduler (hadoop-0.20.203 onwards) allows you to stop a queue and start it again.

That will give you the behavior you described.

Arun

On Dec 12, 2011, at 5:50 AM, Dino Kečo wrote:

> Hi Hadoop users,
> 
> In my company we have been using Hadoop for 2 years and we have need to pause and resume map reduce jobs. I was searching on Hadoop JIRA and there are couple of tickets which are not resolved. So we have implemented our solution. I would like to share this approach with you and to hear your opinion about it.
> 
> We have created one special pool in fair scheduler called PAUSE (maxMapTasks = 0, maxReduceTasks = 0). Our logic for pausing job is to move it into this pool and kill all running tasks. When we want to resume job we move this job into some other pool. Currently we can do maintenance of cloud except Job Tracker while jobs are paused. Also we have some external services which we use and we are doing their maintenance while jobs are paused. 
> 
> We know that records which are processed by running tasks will be reprocessed. In some cases we use same HBase table as input and output and we save job id on record. When record is re-processes we check this job id and skip record if it is processed by same job. 
> 
> Our custom implementation of fair scheduler have this logic implemented and it is deployed to our cluster. 
> 
> Please share your comments and concerns about this approach 
> 
> Regards,
> dino