You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Stefan Groschupf <sg...@media-style.com> on 2005/10/10 15:34:26 UTC
reprocessing hanging tasks
Hi,
I tried to understand the jobtracker code.
Hmm more than 1000 lines of code in just one class. :-( This makes
understanding code very difficult.
Anyway I'm missing a mechanism to reprocess hanging tasks. May I just
didn't find the code, but I invest some time to find it.
As the google paper describe the original map reduce reprocess tasks
that may still run but are much slower than the other tasks because
of some hardware failures.
Since I notice that task-tracker isn't that stabile yet, I would
really love to have such a reprocessing mechanism.
Actually I seen tasks are reprocessed in case the task-tracker crash
and does not return any reports anymore or the task-tracker report a
task failure.
But for example in case the network speed of a fetching mapping task
is very very slow the job itself needs for ever.
I would suggest add start time and finishing time to the task object
and set these values until status changes.
We can calculate a average time a task need for processing based on
this values.
Than we have a configurable value of minimal finished tasks before we
start to reprocessing tasks. For example 80% tasks need to be ready.
Further more we have a configurable values threshold, in case the
processing time of a task is treshold * average processing time, we
just reprocessing the task on a other tasktracker.
What do people think?
Do I miss the section in the jobtracker where this is done, or are
people interested that I submit a patch doing this mechanism?
Stefan
Re: reprocessing hanging tasks
Posted by Doug Cutting <cu...@nutch.org>.
Stefan Groschupf wrote:
> May we misunderstand each other, I do not mean tasks that crash, I mean
> tasks that are 20 times slower on one machine as the other tasks on the
> other machines.
Ah, I call that "speculative re-exectution". Nutch does not yet
implement that.
I don't think speculative re-execution of tasks would help much with
fetching, since a fetch task that is slow on one machine will probably
be slow on another. What would probably make the fetcher faster is to
use Thread.kill() on fetcher threads which have exceeded a timeout, and
then replace them with a new Fetcher thread.
Speculative re-execution is among the list of features we'd like to add,
but it is not a high priority for me.
Doug
Re: reprocessing hanging tasks
Posted by Stefan Groschupf <sg...@media-style.com>.
Doug,
I definitely run several times in problems, where task-trackers was
sending hard-beat messages but hadn't process the job anymore.
For example no new pages was fetched but the page / sec. statistic
becomes slow and slower.
I personal would think it makes more sense in case the jobtracker
decide if a task is over the average processing time and need to be
reexcuted or not.
The last section of the google paper covers this issue and they
notice performance improvements by reexecutng task that are over a
specific time.
May we misunderstand each other, I do not mean tasks that crash, I
mean tasks that are 20 times slower on one machine as the other tasks
on the other machines.
Stefan
Am 10.10.2005 um 20:16 schrieb Doug Cutting:
> Stefan Groschupf wrote:
>
>> Do I miss the section in the jobtracker where this is done, or
>> are people interested that I submit a patch doing this mechanism?
>>
>
> This is mostly already implemented. The tasktracker fails tasks
> that do not update their status within a configurable timeout.
> Task status is updated each time a task reads an input, writes an
> output or calls the Reporter.setStatus() method. The jobtracker
> will retry failed tasks up to four times.
>
> The mapred-based fetcher also should not hang. It will exit even
> when it has hung threads. So the task timeout should be set to the
> maximum amount of time that any single page should require to fetch
> & parse. By default it is set to 10 minutes.
>
> Doug
>
>
---------------------------------------------------------------
company: http://www.media-style.com
forum: http://www.text-mining.org
blog: http://www.find23.net
Re: reprocessing hanging tasks
Posted by Doug Cutting <cu...@nutch.org>.
Stefan Groschupf wrote:
> Do I miss the section in the jobtracker where this is done, or are
> people interested that I submit a patch doing this mechanism?
This is mostly already implemented. The tasktracker fails tasks that do
not update their status within a configurable timeout. Task status is
updated each time a task reads an input, writes an output or calls the
Reporter.setStatus() method. The jobtracker will retry failed tasks up
to four times.
The mapred-based fetcher also should not hang. It will exit even when
it has hung threads. So the task timeout should be set to the maximum
amount of time that any single page should require to fetch & parse. By
default it is set to 10 minutes.
Doug