You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by David Rosenstrauch <da...@darose.net> on 2011/02/03 17:39:12 UTC

Job-wide cleanup functionality?

Perhaps this has been covered before, but I wasn't able to dig up any info.

Is there any way to run a custom "job cleanup" for a map/reduce job?  I 
know that each map and reduce has a cleanup method, which can be used to 
clean up at the end of each task.  But what I want is to run a single 
cleanup step that runs after all the reducers complete.  Is there any 
way to do this in M/R?  I didn't see one.  Would be nice to have an API 
like this to work with:

job.setCleanupClass(MyCleanupClass.class)

Also, if no such functionality exists, what's my next best workaround to 
achieve this on a M/R job?

Thanks,

DR

Re: Job-wide cleanup functionality?

Posted by Friso van Vollenhoven <fv...@xebia.com>.
There is this config option:
<property>
 <name>job.end.notification.url</name>
 <value>http://localhost:8080/jobstatus.php?jobId=$jobId&amp;jobStatus=$jobStatus</value>
 <description>Indicates url which will be called on completion of job to inform
              end status of job.
              User can give at most 2 variables with URI : $jobId and $jobStatus.
              If they are present in URI, then they will be replaced by their
              respective values.
</description>
</property>

That might help you. I have never used it, because it requires running something that listens on HTTP, which introduces an additional dependency that might fail at some point. As far as I know there is no API to achieve the same (I spent some time looking for it). If you need to be absolutely sure that your code runs after the job (as long as the job tracker is still running, of course), then you'd need to create a patch, I guess...


Friso



On 3 feb 2011, at 17:39, David Rosenstrauch wrote:

> Perhaps this has been covered before, but I wasn't able to dig up any info.
> 
> Is there any way to run a custom "job cleanup" for a map/reduce job?  I know that each map and reduce has a cleanup method, which can be used to clean up at the end of each task.  But what I want is to run a single cleanup step that runs after all the reducers complete.  Is there any way to do this in M/R?  I didn't see one.  Would be nice to have an API like this to work with:
> 
> job.setCleanupClass(MyCleanupClass.class)
> 
> Also, if no such functionality exists, what's my next best workaround to achieve this on a M/R job?
> 
> Thanks,
> 
> DR