You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "B. X." <bx...@gmail.com> on 2009/09/16 20:49:15 UTC

Re: different modes of inter process communication

On Wed, Aug 19, 2009 at 7:10 PM, Owen O'Malley <om...@apache.org> wrote:

Thank you both for clearing it up.

I have another related question:  my understanding is that basic
heartbeat mechanism are used to keep different roles (namenode,
datanode, tasktracker etc) aware of each other, but I am not able to
observe this in the log.   For example, if I use the sigstop/sigcont
mechanism to stop the namenode jvm process for 30 seconds and then
continue, I don't observe any extra communications due to supposedly
missed heartbeat.  (I checked the dfs.heartbeat.interval is set to 3
seconds).  Rather, what I saw is all roles seem to stop in unison for
30 seconds (by the fact that no log events in the same time window).

I would appreciate some pointers on how heartbeats are used and configured.

-Bin

Re: different modes of inter process communication

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
Steve Loughran wrote:
> Raghu Angadi wrote:
>>
>> A heartBeat is also an RPC. When you pause Namenode for 30 sec the 
>> datanode's heartbeat thread just waits for 30 sec for its heartbeat 
>> RPC to return. Note that when you pause Namenode, the RPCs to it don't 
>> fail immediately. During this wait, DNs can perform other transactions 
>> like serving data to clients.
> 
> If the heartbeat were just telling the NN that the DN is alive, 

Thats not the case in Hadoop. Central servers don't actively contact 
their slaves. It's been a long standing problem. For e.g. anything that 
NN want to tell a DN, it has to be in the form of response to a 
heartbeat or another RPC.

Raghu.

> you 
> could do it with a UDP that didn't block the DN. If, however, the DN 
> needs to know/care that the NN is up, then you do need to care about the 
> state of the namenode. But you don't have to do it blocking; looking for 
> a UDP back some time later is all you need to do.
> 
> 


Re: different modes of inter process communication

Posted by Steve Loughran <st...@apache.org>.
Raghu Angadi wrote:
> 
> A heartBeat is also an RPC. When you pause Namenode for 30 sec the 
> datanode's heartbeat thread just waits for 30 sec for its heartbeat RPC 
> to return. Note that when you pause Namenode, the RPCs to it don't fail 
> immediately. During this wait, DNs can perform other transactions like 
> serving data to clients.

If the heartbeat were just telling the NN that the DN is alive, you 
could do it with a UDP that didn't block the DN. If, however, the DN 
needs to know/care that the NN is up, then you do need to care about the 
state of the namenode. But you don't have to do it blocking; looking for 
a UDP back some time later is all you need to do.



Re: different modes of inter process communication

Posted by Raghu Angadi <ra...@yahoo-inc.com>.
A heartBeat is also an RPC. When you pause Namenode for 30 sec the 
datanode's heartbeat thread just waits for 30 sec for its heartbeat RPC 
to return. Note that when you pause Namenode, the RPCs to it don't fail 
immediately. During this wait, DNs can perform other transactions like 
serving data to clients.

B. X. wrote:
> On Wed, Aug 19, 2009 at 7:10 PM, Owen O'Malley <om...@apache.org> wrote:
> 
> Thank you both for clearing it up.
> 
> I have another related question:  my understanding is that basic
> heartbeat mechanism are used to keep different roles (namenode,
> datanode, tasktracker etc) aware of each other, but I am not able to
> observe this in the log.   For example, if I use the sigstop/sigcont
> mechanism to stop the namenode jvm process for 30 seconds and then
> continue, I don't observe any extra communications due to supposedly
> missed heartbeat.  (I checked the dfs.heartbeat.interval is set to 3
> seconds).  Rather, what I saw is all roles seem to stop in unison for
> 30 seconds (by the fact that no log events in the same time window).
> 
> I would appreciate some pointers on how heartbeats are used and configured.
> 
> -Bin