You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Chase Bradford <ch...@gmail.com> on 2010/09/12 08:38:47 UTC

custom task cleanup even when task is killed?

I have a mapper class (extended from mapreduce.Mapper), where setup
reports to an outside resource.  I want to make sure that most of the
time when the task fails or is killed, a specific chunk of cleanup
code is executed.  If it's a system or framework error, then I'm
alright with a resource leak.  I just don't want task preemption or
trivial bugs in the map code to leave loose ends, since those happen
pretty often.

I tried to achieve this by overriding the run method, and wrapping the
call to super.run(context) in a try/finally block.  However, it
doesn't seem like the finally block is actually executed when I kill a
specific task.

Does anyone know how the task is told to quit when it is killed?  I
assumed it was by raising an InterruptedException, which should
trigger the finally block, but that doesn't seem to be the case.

Thanks,
Chase

Re: custom task cleanup even when task is killed?

Posted by Ted Yu <yu...@gmail.com>.
Since I don't know what cleanup you're doing, just a reminder that you may
run into:
https://issues.apache.org/jira/browse/HADOOP-4829

On Mon, Sep 13, 2010 at 1:43 PM, Chase Bradford <ch...@gmail.com>wrote:

> Not yet :)  I forgot about that one.
>
> Thanks Ted.
>
> On Mon, Sep 13, 2010 at 1:39 PM, Ted Yu <yu...@gmail.com> wrote:
> > Have you tried this ?
> >         Runtime.getRuntime().addShutdownHook(new ShutdownThread());
> >
> >
> > On Mon, Sep 13, 2010 at 1:33 PM, Chase Bradford <
> chase.bradford@gmail.com>
> > wrote:
> >>
> >> Thanks David,
> >>
> >> Unfortunately, that's only called when a task finishes consuming input
> >> successfully.  My issue deals with tasks that are killed (job is
> >> killed or task is pre-empted by the scheduler).  I tried overriding
> >> run() as follows:
> >>
> >> public void run(Context context) {
> >>  try{
> >>    super.run();
> >>  finally{
> >>    // my very important cleanup stuff that should very rarely get
> missed.
> >>  }
> >> }
> >>
> >> If the task fails by throwing an unhandled exception, then cleanup()
> >> is skipped (the default run doesn't call it), but the finally code
> >> still works.  However, if the task tracker kills the task, then the
> >> finally block is skipped.  I'm reluctant to put in signal handling
> >> code to catch the TERM signal, as that's not Java standard.
> >>
> >> Thanks Again,
> >> Chase
> >>
> >> On Mon, Sep 13, 2010 at 11:28 AM, David Rosenstrauch <darose@darose.net
> >
> >> wrote:
> >> > On 09/12/2010 02:38 AM, Chase Bradford wrote:
> >> >>
> >> >> I have a mapper class (extended from mapreduce.Mapper), where setup
> >> >> reports to an outside resource.  I want to make sure that most of the
> >> >> time when the task fails or is killed, a specific chunk of cleanup
> >> >> code is executed.
> >> >
> >> > Just like there's a "setup(Mapper.Context context)" method, there's
> also
> >> > a
> >> > "cleanup(Mapper.Context context)" method for just this purpose.
> >> >
> >> > See:
> >> >
> >> >
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)<http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29>
> >> >
> >> > DR
> >> >
> >>
> >>
> >>
> >> --
> >> Chase Bradford
> >>
> >>
> >> “If in physics there's something you don't understand, you can always
> >> hide behind the uncharted depths of nature. But if your program
> >> doesn't work, there is no obstinate nature. If it doesn't work, you've
> >> messed up.”
> >>
> >> - Edsger Dijkstra
> >
> >
>
>
>
> --
> Chase Bradford
>
>
> “If in physics there's something you don't understand, you can always
> hide behind the uncharted depths of nature. But if your program
> doesn't work, there is no obstinate nature. If it doesn't work, you've
> messed up.”
>
> - Edsger Dijkstra
>

Re: custom task cleanup even when task is killed?

Posted by Chase Bradford <ch...@gmail.com>.
Not yet :)  I forgot about that one.

Thanks Ted.

On Mon, Sep 13, 2010 at 1:39 PM, Ted Yu <yu...@gmail.com> wrote:
> Have you tried this ?
>         Runtime.getRuntime().addShutdownHook(new ShutdownThread());
>
>
> On Mon, Sep 13, 2010 at 1:33 PM, Chase Bradford <ch...@gmail.com>
> wrote:
>>
>> Thanks David,
>>
>> Unfortunately, that's only called when a task finishes consuming input
>> successfully.  My issue deals with tasks that are killed (job is
>> killed or task is pre-empted by the scheduler).  I tried overriding
>> run() as follows:
>>
>> public void run(Context context) {
>>  try{
>>    super.run();
>>  finally{
>>    // my very important cleanup stuff that should very rarely get missed.
>>  }
>> }
>>
>> If the task fails by throwing an unhandled exception, then cleanup()
>> is skipped (the default run doesn't call it), but the finally code
>> still works.  However, if the task tracker kills the task, then the
>> finally block is skipped.  I'm reluctant to put in signal handling
>> code to catch the TERM signal, as that's not Java standard.
>>
>> Thanks Again,
>> Chase
>>
>> On Mon, Sep 13, 2010 at 11:28 AM, David Rosenstrauch <da...@darose.net>
>> wrote:
>> > On 09/12/2010 02:38 AM, Chase Bradford wrote:
>> >>
>> >> I have a mapper class (extended from mapreduce.Mapper), where setup
>> >> reports to an outside resource.  I want to make sure that most of the
>> >> time when the task fails or is killed, a specific chunk of cleanup
>> >> code is executed.
>> >
>> > Just like there's a "setup(Mapper.Context context)" method, there's also
>> > a
>> > "cleanup(Mapper.Context context)" method for just this purpose.
>> >
>> > See:
>> >
>> > http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)
>> >
>> > DR
>> >
>>
>>
>>
>> --
>> Chase Bradford
>>
>>
>> “If in physics there's something you don't understand, you can always
>> hide behind the uncharted depths of nature. But if your program
>> doesn't work, there is no obstinate nature. If it doesn't work, you've
>> messed up.”
>>
>> - Edsger Dijkstra
>
>



-- 
Chase Bradford


“If in physics there's something you don't understand, you can always
hide behind the uncharted depths of nature. But if your program
doesn't work, there is no obstinate nature. If it doesn't work, you've
messed up.”

- Edsger Dijkstra

Re: custom task cleanup even when task is killed?

Posted by Ted Yu <yu...@gmail.com>.
Have you tried this ?
        Runtime.getRuntime().addShutdownHook(new ShutdownThread());


On Mon, Sep 13, 2010 at 1:33 PM, Chase Bradford <ch...@gmail.com>wrote:

> Thanks David,
>
> Unfortunately, that's only called when a task finishes consuming input
> successfully.  My issue deals with tasks that are killed (job is
> killed or task is pre-empted by the scheduler).  I tried overriding
> run() as follows:
>
> public void run(Context context) {
>  try{
>    super.run();
>  finally{
>    // my very important cleanup stuff that should very rarely get missed.
>  }
> }
>
> If the task fails by throwing an unhandled exception, then cleanup()
> is skipped (the default run doesn't call it), but the finally code
> still works.  However, if the task tracker kills the task, then the
> finally block is skipped.  I'm reluctant to put in signal handling
> code to catch the TERM signal, as that's not Java standard.
>
> Thanks Again,
> Chase
>
> On Mon, Sep 13, 2010 at 11:28 AM, David Rosenstrauch <da...@darose.net>
> wrote:
> > On 09/12/2010 02:38 AM, Chase Bradford wrote:
> >>
> >> I have a mapper class (extended from mapreduce.Mapper), where setup
> >> reports to an outside resource.  I want to make sure that most of the
> >> time when the task fails or is killed, a specific chunk of cleanup
> >> code is executed.
> >
> > Just like there's a "setup(Mapper.Context context)" method, there's also
> a
> > "cleanup(Mapper.Context context)" method for just this purpose.
> >
> > See:
> >
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)<http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup%28org.apache.hadoop.mapreduce.Mapper.Context%29>
> >
> > DR
> >
>
>
>
> --
> Chase Bradford
>
>
> “If in physics there's something you don't understand, you can always
> hide behind the uncharted depths of nature. But if your program
> doesn't work, there is no obstinate nature. If it doesn't work, you've
> messed up.”
>
> - Edsger Dijkstra
>

Re: custom task cleanup even when task is killed?

Posted by Chase Bradford <ch...@gmail.com>.
Thanks David,

Unfortunately, that's only called when a task finishes consuming input
successfully.  My issue deals with tasks that are killed (job is
killed or task is pre-empted by the scheduler).  I tried overriding
run() as follows:

public void run(Context context) {
  try{
    super.run();
  finally{
    // my very important cleanup stuff that should very rarely get missed.
  }
}

If the task fails by throwing an unhandled exception, then cleanup()
is skipped (the default run doesn't call it), but the finally code
still works.  However, if the task tracker kills the task, then the
finally block is skipped.  I'm reluctant to put in signal handling
code to catch the TERM signal, as that's not Java standard.

Thanks Again,
Chase

On Mon, Sep 13, 2010 at 11:28 AM, David Rosenstrauch <da...@darose.net> wrote:
> On 09/12/2010 02:38 AM, Chase Bradford wrote:
>>
>> I have a mapper class (extended from mapreduce.Mapper), where setup
>> reports to an outside resource.  I want to make sure that most of the
>> time when the task fails or is killed, a specific chunk of cleanup
>> code is executed.
>
> Just like there's a "setup(Mapper.Context context)" method, there's also a
> "cleanup(Mapper.Context context)" method for just this purpose.
>
> See:
> http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)
>
> DR
>



-- 
Chase Bradford


“If in physics there's something you don't understand, you can always
hide behind the uncharted depths of nature. But if your program
doesn't work, there is no obstinate nature. If it doesn't work, you've
messed up.”

- Edsger Dijkstra

Re: custom task cleanup even when task is killed?

Posted by David Rosenstrauch <da...@darose.net>.
On 09/12/2010 02:38 AM, Chase Bradford wrote:
> I have a mapper class (extended from mapreduce.Mapper), where setup
> reports to an outside resource.  I want to make sure that most of the
> time when the task fails or is killed, a specific chunk of cleanup
> code is executed.

Just like there's a "setup(Mapper.Context context)" method, there's also 
a "cleanup(Mapper.Context context)" method for just this purpose.

See: 
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapreduce/Mapper.html#cleanup(org.apache.hadoop.mapreduce.Mapper.Context)

DR