You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Ersoy Bayramoglu <er...@yale.edu> on 2010/05/04 01:55:51 UTC

Aborting a MapReduce job

Hi,

I'm a new user. I have a question about aborting an ongoing mapreduce job. If
one of the mappers compute a particular value, I'd like to stop the entire job,
and give the control back to the master. Is this possible in Hadoop?

Re: Aborting a MapReduce job

Posted by David Rosenstrauch <da...@darose.net>.
On 05/03/2010 07:55 PM, Ersoy Bayramoglu wrote:
> Hi,
>
> I'm a new user. I have a question about aborting an ongoing mapreduce job. If
> one of the mappers compute a particular value, I'd like to stop the entire job,
> and give the control back to the master. Is this possible in Hadoop?

Can't you just throw an exception inside the mapper?  I think that would 
kill the job.  It wouldn't stop any job tasks currently in progress, but 
presumably it would prevent new tasks from getting launched.

If that's not sufficient, and you absolutely need to completely stop all 
currently-in-progress map tasks, then I think the only solution would be 
to use Zookeeper.  i.e., make your map/reduce tasks periodically check 
for the presence of some abort node in ZK.

HTH,

DR

Re: Aborting a MapReduce job

Posted by Amogh Vasekar <am...@yahoo-inc.com>.
Hi,
A hack that immediately comes to my mind is having the mapper touch a predetermined filepath and use that to clean up. Or alternatively, check the RunningJob interface available via JobClient, you can monitor and kill tasks from there too.

Amogh


On 5/4/10 9:46 AM, "Ersoy Bayramoglu" <er...@yale.edu> wrote:

Thanks, I see. Alternatively, is there a way for the mapper to notify
the master
that something unusual has happened? If I can print something on the
screen when
this happens, I can abort manually too. I want to measure the time (preferably
without waiting for the concurrently executing mappers to finish).

Quoting "Bae, Jae Hyeon" <me...@gmail.com>:

> I tried to find the way you wanted, but I couldn't.
>
> As I know, hadoop framework doesn't provide the function to stop the
> entire job in the specific moment.
>
> I like to suggest to use Counter. For example, if you want to stop the
> entire job, you can set the Counter on. Every mapper instance should
> check the value of Counter before proceeding, if the value of Counter
> is on, running mapper should stop the execution. When remaining mapper
> instances is picked up by job tracker and starts running, they check
> the value of Counter on and return immediately.
>
> Look at the following sample code, this is code snippet of my mapper
> class. In this code, I am using configuration object, but you can
> change this implementation to using counter.
>
>        @Override
>       public void run(Context context) throws IOException, InterruptedException {
>               boolean run = false;
>               long start = context.getConfiguration().getLong("debug", -1);
>               if (start != -1) {
>                       if (start == ((RAGZIPFileSplit)context.getInputSplit()).getStart()) {
>                               run = true;
>                       } else {
>                               run = false;
>                       }
>               } else {
>                       run = true;
>               }
>
>               if (run) {
>                       setup(context);
>                       while (context.nextKeyValue()) {
>                               map(context.getCurrentKey(), context.getCurrentValue(), context);
>                       }
>                       cleanup(context);
>               }
>       }
>
> 2010/5/4 Ersoy Bayramoglu <er...@yale.edu>:
>> Hi,
>>
>> I'm a new user. I have a question about aborting an ongoing
>> mapreduce job. If
>> one of the mappers compute a particular value, I'd like to stop the
>> entire job,
>> and give the control back to the master. Is this possible in Hadoop?
>>
>




Re: Aborting a MapReduce job

Posted by Ersoy Bayramoglu <er...@yale.edu>.
Thanks, I see. Alternatively, is there a way for the mapper to notify 
the master
that something unusual has happened? If I can print something on the 
screen when
this happens, I can abort manually too. I want to measure the time (preferably
without waiting for the concurrently executing mappers to finish).

Quoting "Bae, Jae Hyeon" <me...@gmail.com>:

> I tried to find the way you wanted, but I couldn't.
>
> As I know, hadoop framework doesn't provide the function to stop the
> entire job in the specific moment.
>
> I like to suggest to use Counter. For example, if you want to stop the
> entire job, you can set the Counter on. Every mapper instance should
> check the value of Counter before proceeding, if the value of Counter
> is on, running mapper should stop the execution. When remaining mapper
> instances is picked up by job tracker and starts running, they check
> the value of Counter on and return immediately.
>
> Look at the following sample code, this is code snippet of my mapper
> class. In this code, I am using configuration object, but you can
> change this implementation to using counter.
>
>        @Override
> 	public void run(Context context) throws IOException, InterruptedException {
> 		boolean run = false;
> 		long start = context.getConfiguration().getLong("debug", -1);
> 		if (start != -1) {
> 			if (start == ((RAGZIPFileSplit)context.getInputSplit()).getStart()) {
> 				run = true;
> 			} else {
> 				run = false;
> 			}
> 		} else {
> 			run = true;
> 		}
>
> 		if (run) {
> 			setup(context);
> 			while (context.nextKeyValue()) {
> 				map(context.getCurrentKey(), context.getCurrentValue(), context);
> 			}
> 			cleanup(context);
> 		}
> 	}
>
> 2010/5/4 Ersoy Bayramoglu <er...@yale.edu>:
>> Hi,
>>
>> I'm a new user. I have a question about aborting an ongoing 
>> mapreduce job. If
>> one of the mappers compute a particular value, I'd like to stop the 
>> entire job,
>> and give the control back to the master. Is this possible in Hadoop?
>>
>



Re: Aborting a MapReduce job

Posted by "Bae, Jae Hyeon" <me...@gmail.com>.
I tried to find the way you wanted, but I couldn't.

As I know, hadoop framework doesn't provide the function to stop the
entire job in the specific moment.

I like to suggest to use Counter. For example, if you want to stop the
entire job, you can set the Counter on. Every mapper instance should
check the value of Counter before proceeding, if the value of Counter
is on, running mapper should stop the execution. When remaining mapper
instances is picked up by job tracker and starts running, they check
the value of Counter on and return immediately.

Look at the following sample code, this is code snippet of my mapper
class. In this code, I am using configuration object, but you can
change this implementation to using counter.

        @Override
	public void run(Context context) throws IOException, InterruptedException {
		boolean run = false;
		long start = context.getConfiguration().getLong("debug", -1);
		if (start != -1) {
			if (start == ((RAGZIPFileSplit)context.getInputSplit()).getStart()) {
				run = true;
			} else {
				run = false;
			}
		} else {
			run = true;
		}

		if (run) {
			setup(context);
			while (context.nextKeyValue()) {
				map(context.getCurrentKey(), context.getCurrentValue(), context);
			}
			cleanup(context);
		}
	}

2010/5/4 Ersoy Bayramoglu <er...@yale.edu>:
> Hi,
>
> I'm a new user. I have a question about aborting an ongoing mapreduce job. If
> one of the mappers compute a particular value, I'd like to stop the entire job,
> and give the control back to the master. Is this possible in Hadoop?
>