You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jun Rao <ju...@almaden.ibm.com> on 2007/07/06 07:29:14 UTC
Re: map task in initializing phase for too long
I enabled logging. The slow map task was caused when making a socket
connection call in setupIOstreams()(triggered by the first RPC
call--getProtocolVersion()--from MapTask to TaskTracker). If the socket
connection call was made at t1, the call didn't return until t1 + ~200
seconds (normally, each Map task takes about 8 seconds). At the RPC server
side, doAccept() was also called at t1 + ~200 seconds. I was running a Job
with 200+ splits 10 times. On average, there was one slow map task per run
(all slow Map tasks took ~200 seconds to make the socket connection). I
was using a recent 64-bit IBM JVM on SuSe.
Jun
IBM Almaden Research Center
K55/B1, 650 Harry Road, San Jose, CA 95120-6099
junrao@almaden.ibm.com
(408)927-1886 (phone)
(408)927-3215 (fax)
Doug Cutting <cu...@apache.org>
06/21/2007 09:21 AM
Please respond to
hadoop-user@lucene.apache.org
To
hadoop-user@lucene.apache.org
cc
Subject
Re: map task in initializing phase for too long
Jun Rao wrote:
> I am wondering if anyone has experienced this problem. Sometimes when I
> ran a job, a few map tasks (often just one) hang in the initializing
phase
> for more than 3 minutes (it normally finishes in a couple seconds). They
> will eventually finish, but the whole job is slowed down considerably.
The
> weird thing is that the slow task is not deterministic. It doesn't
always
> occur and if does, can occur on any split and on any host.
I have not seen this.
Perhaps you can get a stack trace from the tasktracker while this is
happening?
Owen described how to get such stack traces in:
http://mail-archives.apache.org/mod_mbox/lucene-hadoop-user/200706.mbox/%3c3AB557D4-4B71-4286-BB36-1A449F28BAD5@yahoo-inc.com%3e
Owen wrote:
> One side note is that all of the servers have a servlet such that if
> you do http://<node>:<port>/stacks you'll get a stack trace of all
> the threads in the server. I find that useful for remote debugging.
> *smile* Although if it is a task jvm that has the problem, then there
> isn't a server for them.
(This should probably be added to the documentation or the wiki...)
Doug