You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Bryan A. Pendleton" <bp...@geekdom.net> on 2006/06/06 20:47:00 UTC

Recent IPC changes?

I've been mostly tracking TRUNK. In particular, I update every few days. I
updated to the post 0.3 trunk yesterday, and now I'm seeing very weird
slowdowns in job distribution.

What appears to be happening is that that heartbeats aren't registering with
the job tracker, except after an RPC timeout. If my RPC timeout is the
default (60s), it takes 60s for the refresh cycle - effectively meaning that
it takes 60s for *any* tasks to start being distributed, after a job starts
up. If I turn the IPC timeout down to 10s, then the heartbeats happen at 10s
instead. Is this intended? Is there some new setting that I should be
setting (that's not in hadoop-default, I guess)?

-- 
Bryan A. Pendleton
Ph: (877) geek-1-bp

Re: Recent IPC changes?

Posted by Owen O'Malley <ow...@yahoo-inc.com>.

On Jun 6, 2006, at 11:47 AM, Bryan A. Pendleton wrote:

> What appears to be happening is that that heartbeats aren't 
> registering with
> the job tracker, except after an RPC timeout. If my RPC timeout is the
> default (60s), it takes 60s for the refresh cycle - effectively 
> meaning that
> it takes 60s for *any* tasks to start being distributed, after a job 
> starts
> up. If I turn the IPC timeout down to 10s, then the heartbeats happen 
> at 10s
> instead. Is this intended? Is there some new setting that I should be
> setting (that's not in hadoop-default, I guess)?

There shouldn't be. Which call is timing out? It should be in the log 
of the relevant task tracker. Look for the SocketTimeoutException in 
the log. The call stack will tell us which call is causing the problem.

-- Owen