You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/07/16 12:28:53 UTC

Fetcher thread time out

Hi,

With large map output the task tracker can time out (no progress update during 
merge). Using io.sort.factor i can tune the merge phase to proceed a bit 
faster. Yet it can still time out when the cluster is very busy etc. I've 
increased the task time out but now it also takes longer to get rid of handing 
threads.

The fetcher thread time out is mapred.task.timeout / 2, it makes sense but i 
guess it would make more sense to reduce the time out value even further; why 
would i want to wait so long for it to get aborted anyway? Now a single mapper 
can have a huge impact in avg. thoughput.

Thought? 
thanks

Re: Fetcher thread time out

Posted by Markus Jelsma <ma...@openindex.io>.
I reduced the thread time out time to mapred.task.timeout / 8, meaning 150 
seconds in my case. This actually helps for mappers that finish the queue but 
remaing hanging on some items and it helps to prematurely end instead of kill 
a task that's running on a server with too high load. My VM's suffer from 
having RAID-5 enabled and a short on RAM so i/o-wait is high. Fetcher threads 
that would normally be killed by the tracker are now being timed out. This 
means that whatever it's fetched is saved and no new single map is started, 
which would increase run time again.

Comments?

> Hi,
> 
> With large map output the task tracker can time out (no progress update
> during merge). Using io.sort.factor i can tune the merge phase to proceed
> a bit faster. Yet it can still time out when the cluster is very busy etc.
> I've increased the task time out but now it also takes longer to get rid
> of handing threads.
> 
> The fetcher thread time out is mapred.task.timeout / 2, it makes sense but
> i guess it would make more sense to reduce the time out value even
> further; why would i want to wait so long for it to get aborted anyway?
> Now a single mapper can have a huge impact in avg. thoughput.
> 
> Thought?
> thanks