You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by praveenesh kumar <pr...@gmail.com> on 2012/01/30 08:06:15 UTC

Killing hadoop jobs automatically

Is there anyway through which we can kill hadoop jobs that are taking
enough time to execute ?

What I want to achieve is - If some job is running more than
"_some_predefined_timeout_limit", it should be killed automatically.

Is it possible to achieve this, through shell scripts or any other way ?

Thanks,
Praveenesh

Re: Killing hadoop jobs automatically

Posted by Harsh J <ha...@cloudera.com>.

In the current stables, this is available at the task level with a
default fo 10m of non-responsiveness per task. Controlled per-job via
"mapred.task.timeout".

There is no built-in feature that lets you monitor and set a timeout
on the job execution itself, however (but should be easy to do) -- How
do you imagine this being useful vs. per-task timeouts that help
unsticking jobs or failing them eventually if they are improperly
written (which causes them to hang and not report any status for the
timeout period)?

On Mon, Jan 30, 2012 at 12:36 PM, praveenesh kumar <pr...@gmail.com> wrote:
> Is there anyway through which we can kill hadoop jobs that are taking
> enough time to execute ?
>
> What I want to achieve is - If some job is running more than
> "_some_predefined_timeout_limit", it should be killed automatically.
>
> Is it possible to achieve this, through shell scripts or any other way ?
>
> Thanks,
> Praveenesh

-- 
Harsh J
Customer Ops. Engineer, Cloudera

Re: Killing hadoop jobs automatically

Posted by Masoud <ma...@agape.hanyang.ac.kr>.

Dear Praveenesh

I think there are only two ways to kill a job:
1- kill command, (not perfect way cause you should know the job id)
2- mapred.task.timeout (in "bin/hadoop jar" command using {-Dmapred.task.timeout=} set your desired value in msec)

sometimes for me its happened too, not in all machines in some special machines jobs executed slowly than others i think cause of hardware problems.
As i know Shuffling is done by hadoop and we can only contribute in it by setting output format class.Be aware its normal that some jobs finished later than others so dont be so sensitive on it since hadoop manage all things, overall result is our goal in hadoop based computation,

I hope it could be helpful.

Good Luck,
Masoud,


On 01/30/2012 06:07 PM, praveenesh kumar wrote:
> @ Harsh -
>
> Yeah, mapred.task.timeout is the valid option. but for some reasons, its
> not happening the way it should be.. I am not sure what could be the
> cause.Thing is my jobs are running fine, its just that they are slow at
> shuffling phase, sometimes.. not everytime.. so I was thinking "as an admin
> - can we control the running of jobs, just as a  test, where we can just
> kill the jobs who are taking more time for execution -- not only those jobs
> that are hanging..but jobs that are taking more execution time than
> expected". Problems in my case is, end-user doesn't want to go through the
> pain of managing/controlling jobs over hadoop. They want all these job
> handling should happen automatically, so that made me to think in such a
> way (which I know is not the best way)
>
> Anyways, going away from the topic -- Is there anyway through which I can
> improve my shuffling (through any configuration parameters only, knowing
> the fact that users doesn't know the idea of minimizing the key/value
> pairs)
>
> Thanks,
> Praveenesh
>
> On Mon, Jan 30, 2012 at 1:06 PM, Masoud<ma...@agape.hanyang.ac.kr>  wrote:
>
>> Hi,
>>
>> Every Map/Reduce app has a Reporter, You can set the configuration
>> parameter {mapred.task.timeout} of  Reporter to your desired value.
>>
>> Good Luck.
>>
>>
>> On 01/30/2012 04:14 PM, praveenesh kumar wrote:
>>
>>> Yeah, I am aware of that, but it needs you to explicity monitor the job
>>> and
>>> look for jobid and then hadoop job -kill command.
>>> What I want to know - "Is there anyway to do all this automatically by
>>> providing some timer or something -- that if my job is taking more than
>>> some predefined time, it would get killed automatically
>>>
>>> Thanks,
>>> Praveenesh
>>>
>>> On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
>>> <pr...@gmail.com>wrote:
>>>
>>>   You might want to take a look at the kill command : "hadoop job -kill
>>>> <jobid>".
>>>>
>>>> Prashant
>>>>
>>>> On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar<praveenesh@gmail.com
>>>>
>>>>> wrote:
>>>>> Is there anyway through which we can kill hadoop jobs that are taking
>>>>> enough time to execute ?
>>>>>
>>>>> What I want to achieve is - If some job is running more than
>>>>> "_some_predefined_timeout_**limit", it should be killed automatically.
>>>>>
>>>>> Is it possible to achieve this, through shell scripts or any other way ?
>>>>>
>>>>> Thanks,
>>>>> Praveenesh
>>>>>
>>>>>

Re: Killing hadoop jobs automatically

Posted by praveenesh kumar <pr...@gmail.com>.

@ Harsh -

Yeah, mapred.task.timeout is the valid option. but for some reasons, its
not happening the way it should be.. I am not sure what could be the
cause.Thing is my jobs are running fine, its just that they are slow at
shuffling phase, sometimes.. not everytime.. so I was thinking "as an admin
- can we control the running of jobs, just as a  test, where we can just
kill the jobs who are taking more time for execution -- not only those jobs
that are hanging..but jobs that are taking more execution time than
expected". Problems in my case is, end-user doesn't want to go through the
pain of managing/controlling jobs over hadoop. They want all these job
handling should happen automatically, so that made me to think in such a
way (which I know is not the best way)

Anyways, going away from the topic -- Is there anyway through which I can
improve my shuffling (through any configuration parameters only, knowing
the fact that users doesn't know the idea of minimizing the key/value
pairs)

Thanks,
Praveenesh

On Mon, Jan 30, 2012 at 1:06 PM, Masoud <ma...@agape.hanyang.ac.kr> wrote:

> Hi,
>
> Every Map/Reduce app has a Reporter, You can set the configuration
> parameter {mapred.task.timeout} of  Reporter to your desired value.
>
> Good Luck.
>
>
> On 01/30/2012 04:14 PM, praveenesh kumar wrote:
>
>> Yeah, I am aware of that, but it needs you to explicity monitor the job
>> and
>> look for jobid and then hadoop job -kill command.
>> What I want to know - "Is there anyway to do all this automatically by
>> providing some timer or something -- that if my job is taking more than
>> some predefined time, it would get killed automatically
>>
>> Thanks,
>> Praveenesh
>>
>> On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
>> <pr...@gmail.com>wrote:
>>
>>  You might want to take a look at the kill command : "hadoop job -kill
>>> <jobid>".
>>>
>>> Prashant
>>>
>>> On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar<praveenesh@gmail.com
>>>
>>>> wrote:
>>>> Is there anyway through which we can kill hadoop jobs that are taking
>>>> enough time to execute ?
>>>>
>>>> What I want to achieve is - If some job is running more than
>>>> "_some_predefined_timeout_**limit", it should be killed automatically.
>>>>
>>>> Is it possible to achieve this, through shell scripts or any other way ?
>>>>
>>>> Thanks,
>>>> Praveenesh
>>>>
>>>>
>

Re: Killing hadoop jobs automatically

Posted by Masoud <ma...@agape.hanyang.ac.kr>.

Hi,

Every Map/Reduce app has a Reporter, You can set the configuration 
parameter {mapred.task.timeout} of  Reporter to your desired value.

Good Luck.

On 01/30/2012 04:14 PM, praveenesh kumar wrote:
> Yeah, I am aware of that, but it needs you to explicity monitor the job and
> look for jobid and then hadoop job -kill command.
> What I want to know - "Is there anyway to do all this automatically by
> providing some timer or something -- that if my job is taking more than
> some predefined time, it would get killed automatically
>
> Thanks,
> Praveenesh
>
> On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
> <pr...@gmail.com>wrote:
>
>> You might want to take a look at the kill command : "hadoop job -kill
>> <jobid>".
>>
>> Prashant
>>
>> On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar<praveenesh@gmail.com
>>> wrote:
>>> Is there anyway through which we can kill hadoop jobs that are taking
>>> enough time to execute ?
>>>
>>> What I want to achieve is - If some job is running more than
>>> "_some_predefined_timeout_limit", it should be killed automatically.
>>>
>>> Is it possible to achieve this, through shell scripts or any other way ?
>>>
>>> Thanks,
>>> Praveenesh
>>>

Re: Killing hadoop jobs automatically

Posted by praveenesh kumar <pr...@gmail.com>.

Yeah, I am aware of that, but it needs you to explicity monitor the job and
look for jobid and then hadoop job -kill command.
What I want to know - "Is there anyway to do all this automatically by
providing some timer or something -- that if my job is taking more than
some predefined time, it would get killed automatically

Thanks,
Praveenesh

On Mon, Jan 30, 2012 at 12:38 PM, Prashant Kommireddi
<pr...@gmail.com>wrote:

> You might want to take a look at the kill command : "hadoop job -kill
> <jobid>".
>
> Prashant
>
> On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar <praveenesh@gmail.com
> >wrote:
>
> > Is there anyway through which we can kill hadoop jobs that are taking
> > enough time to execute ?
> >
> > What I want to achieve is - If some job is running more than
> > "_some_predefined_timeout_limit", it should be killed automatically.
> >
> > Is it possible to achieve this, through shell scripts or any other way ?
> >
> > Thanks,
> > Praveenesh
> >
>

Re: Killing hadoop jobs automatically

Posted by Prashant Kommireddi <pr...@gmail.com>.

You might want to take a look at the kill command : "hadoop job -kill
<jobid>".

Prashant

On Sun, Jan 29, 2012 at 11:06 PM, praveenesh kumar <pr...@gmail.com>wrote:

> Is there anyway through which we can kill hadoop jobs that are taking
> enough time to execute ?
>
> What I want to achieve is - If some job is running more than
> "_some_predefined_timeout_limit", it should be killed automatically.
>
> Is it possible to achieve this, through shell scripts or any other way ?
>
> Thanks,
> Praveenesh
>