You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jun Rao <ju...@almaden.ibm.com> on 2007/06/21 02:20:27 UTC

map task in initializing phase for too long

Hi,

I am wondering if anyone has experienced this problem. Sometimes when I 
ran a job, a few map tasks (often just one) hang in the initializing phase 
for more than 3 minutes (it normally finishes in a couple seconds). They 
will eventually finish, but the whole job is slowed down considerably. The 
weird thing is that the slow task is not deterministic. It doesn't always 
occur and if does, can occur on any split and on any host.

I'd appreciate any help on understanding this.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com
(408)927-1886 (phone)
(408)927-3215 (fax)

Re: map task in initializing phase for too long

Posted by Doug Cutting <cu...@apache.org>.
Raghu Angadi wrote:
> Doug Cutting wrote:
>> Owen wrote:
>>> One side note is that all of the servers have a servlet such that if  
>>> you do http://<node>:<port>/stacks you'll get a stack trace of all  
>>> the threads in the server. I find that useful for remote debugging.  
>>> *smile* Although if it is a task jvm that has the problem, then 
>>> there  isn't a server for them.
>>
>> (This should probably be added to the documentation or the wiki...)
> 
> We should. This is very useful. I have been working on Hadoop for months 
> but still didn't know.

A good place might be on:

http://wiki.apache.org/lucene-hadoop/HowToDebugMapReducePrograms

Doug

Re: map task in initializing phase for too long

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Doug Cutting wrote:
> Owen wrote:
>> One side note is that all of the servers have a servlet such that if  
>> you do http://<node>:<port>/stacks you'll get a stack trace of all  
>> the threads in the server. I find that useful for remote debugging.  
>> *smile* Although if it is a task jvm that has the problem, then there  
>> isn't a server for them.
> 
> (This should probably be added to the documentation or the wiki...)

We should. This is very useful. I have been working on Hadoop for months 
but still didn't know.

Raghu.

> Doug


Re: map task in initializing phase for too long

Posted by Jun Rao <ju...@almaden.ibm.com>.
I enabled logging. The slow map task was caused when making a socket 
connection call in setupIOstreams()(triggered by the first RPC 
call--getProtocolVersion()--from MapTask to TaskTracker). If the socket 
connection call was made at t1, the call didn't return until t1 + ~200 
seconds (normally, each Map task takes about 8 seconds). At the RPC server 
side, doAccept() was also called at t1 + ~200 seconds. I was running a Job 
with 200+ splits 10 times. On average, there was one slow map task per run 
(all slow Map tasks took ~200 seconds to make the socket connection). I 
was using a recent 64-bit IBM JVM on SuSe.

Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA  95120-6099

junrao@almaden.ibm.com
(408)927-1886 (phone)
(408)927-3215 (fax)




Doug Cutting <cu...@apache.org> 
06/21/2007 09:21 AM
Please respond to
hadoop-user@lucene.apache.org


To
hadoop-user@lucene.apache.org
cc

Subject
Re: map task in initializing phase for too long






Jun Rao wrote:
> I am wondering if anyone has experienced this problem. Sometimes when I 
> ran a job, a few map tasks (often just one) hang in the initializing 
phase 
> for more than 3 minutes (it normally finishes in a couple seconds). They 

> will eventually finish, but the whole job is slowed down considerably. 
The 
> weird thing is that the slow task is not deterministic. It doesn't 
always 
> occur and if does, can occur on any split and on any host.

I have not seen this.

Perhaps you can get a stack trace from the tasktracker while this is 
happening?

Owen described how to get such stack traces in:

http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/%3c3AB557D4-4B71-4286-BB36-1A449F28BAD5@yahoo-inc.com%3e


Owen wrote:
> One side note is that all of the servers have a servlet such that if 
> you do http://<node>:<port>/stacks you'll get a stack trace of all 
> the threads in the server. I find that useful for remote debugging. 
> *smile* Although if it is a task jvm that has the problem, then there 
> isn't a server for them.

(This should probably be added to the documentation or the wiki...)

Doug


Re: map task in initializing phase for too long

Posted by Doug Cutting <cu...@apache.org>.
Jun Rao wrote:
> I am wondering if anyone has experienced this problem. Sometimes when I 
> ran a job, a few map tasks (often just one) hang in the initializing phase 
> for more than 3 minutes (it normally finishes in a couple seconds). They 
> will eventually finish, but the whole job is slowed down considerably. The 
> weird thing is that the slow task is not deterministic. It doesn't always 
> occur and if does, can occur on any split and on any host.

I have not seen this.

Perhaps you can get a stack trace from the tasktracker while this is 
happening?

Owen described how to get such stack traces in:

http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/%3c3AB557D4-4B71-4286-BB36-1A449F28BAD5@yahoo-inc.com%3e

Owen wrote:
> One side note is that all of the servers have a servlet such that if  
> you do http://<node>:<port>/stacks you'll get a stack trace of all  
> the threads in the server. I find that useful for remote debugging.  
> *smile* Although if it is a task jvm that has the problem, then there  
> isn't a server for them.

(This should probably be added to the documentation or the wiki...)

Doug