You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Wojciech Biela <il...@gmail.com> on 2011/11/23 14:52:25 UTC

how to manually fail a task that ran on a failed tasktracker

Hi

I'm in a need to manually fail a task (tip - TaskInProgress), probably via
failing its task attempts, whose tasktracker is suddenly not available
I want to do this from the FairScheduler.update() (my own scheduler
extending FairScheduler and its update() method).
The problem is when I just do, what is usually done by JobClient, when you
fail a task from the command line:

taskTrackerManager.killTask(taskAttemptId, true);

after the tasktracker process is killed, then this job hangs.

I simulate this by:
1. running the job
2. while the job runs killing the tasktracker process
3. then when I detect this in my version of FairScheduler.update() method,
I want to fail the task so that the JobTracker will go on (not waiting for
the global timeout).

What happens now when I do the above (for all task attempts of for this
TIP) is that the task attempts are marked as failed/killed by the task
still hangs.
Executing tip.kill() additionally after failing all task attempts does not
change this behavior.

Do you have any ideas on how to effectively more or less instantly fail
this task and let the whole process go on, not hang?

-- 
Wojtek