You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jun Rao <ju...@almaden.ibm.com> on 2007/07/06 07:29:14 UTC

Re: map task in initializing phase for too long

I enabled logging. The slow map task was caused when making a socket 
connection call in setupIOstreams()(triggered by the first RPC 
call--getProtocolVersion()--from MapTask to TaskTracker). If the socket 
connection call was made at t1, the call didn't return until t1 + ~200 
seconds (normally, each Map task takes about 8 seconds). At the RPC server 
side, doAccept() was also called at t1 + ~200 seconds. I was running a Job 
with 200+ splits 10 times. On average, there was one slow map task per run 
(all slow Map tasks took ~200 seconds to make the socket connection). I 
was using a recent 64-bit IBM JVM on SuSe.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com
(408)927-1886 (phone)
(408)927-3215 (fax)




Doug Cutting <cu...@apache.org> 
06/21/2007 09:21 AM
Please respond to
hadoop-user@lucene.apache.org


To
hadoop-user@lucene.apache.org
cc

Subject
Re: map task in initializing phase for too long






Jun Rao wrote:
> I am wondering if anyone has experienced this problem. Sometimes when I 
> ran a job, a few map tasks (often just one) hang in the initializing 
phase 
> for more than 3 minutes (it normally finishes in a couple seconds). They 

> will eventually finish, but the whole job is slowed down considerably. 
The 
> weird thing is that the slow task is not deterministic. It doesn't 
always 
> occur and if does, can occur on any split and on any host.

I have not seen this.

Perhaps you can get a stack trace from the tasktracker while this is 
happening?

Owen described how to get such stack traces in:

http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/%3c3AB557D4-4B71-4286-BB36-1A449F28BAD5@yahoo-inc.com%3e


Owen wrote:
> One side note is that all of the servers have a servlet such that if 
> you do http://<node>:<port>/stacks you'll get a stack trace of all 
> the threads in the server. I find that useful for remote debugging. 
> *smile* Although if it is a task jvm that has the problem, then there 
> isn't a server for them.

(This should probably be added to the documentation or the wiki...)

Doug