You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Randy Fox <rf...@connexity.com> on 2016/02/04 00:42:34 UTC

Preemption and the rerunning of mappers

We have a cluster where we have both quick and long running jobs.  The long running jobs are mappers that take 1+ hours and reducers that take 1+hours and can take all the resources on our cluster.

We want a configuration where the quick jobs get resources and not get blocked by the long running jobs.  We are using the Fair scheduler with the long jobs in one queue and the short jobs in another queue.

Seems preemption with minResources is the only way to have the quick running jobs get cluster time.  But not only do they preempt tasks that have been running for a while (gripe: the preemptor does not seem to kill the jobs that have been running the shortest), but if they preempt a reducer then mappers need to be rerun as the data is lost and the reducers wait, wasting resources and causing more preemption.  This creates a nasty cycle and the job never finishes.

I am looking for suggestion on how to make this work.   Is there a way to have the mappers not delete the intermediate data until the job finishes?  Is there a way to have the preempter kill the shorted running jobs?  Or am I approaching this entirely wrong?

Thanks in advance for any and all advice.

Cheers,

Randy

PS: Getting a larger cluster is not an option.