You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ambari.apache.org by Loïc Chanel <lo...@telecomnancy.net> on 2015/07/23 17:51:18 UTC

HA NameNode switching without reason

Hi,

I am using a high-availability cluster with 3 JournalNodes and 2 NameNodes
on 2 out of 3 of these hosts, and the NameNode switched his host 3 times in
less than 24 hours without apparent reason.

This can't be a network problem, as the logs indicate clearly that the
NameNode can't send logs to the JournalNode running on the exact same host,
while calling it using its IP, and this doesn't seem to be a CPU or RAM
problem as the command sar does not return any abnormality, and Ganglia
graphics show that the JVM has way more memory than it needs to have.

Do any of you have an idea about where the problem might come from ?
Thanks in advance,


Loïc

Loïc CHANEL
Engineering student at TELECOM Nancy
Trainee at Worldline - Villeurbanne

Re: HA NameNode switching without reason

Posted by Benoit Perroud <be...@noisette.ch>.
This is probably not the best place to ask such questions as they are not
specifically related to Ambari but HDFS.

There are lots of scenarios when a NN can switch, and there is always a
good reason for that :)

Some of them can be:
- if you're running a older version of hadoop/hdp (2.1), slow block reports
or slow fsimage transfer can lead to NN switch,
- you rpc pool is too small (NN server thread)
- you're hit by the futex lock bug (
https://groups.google.com/forum/#!topic/mechanical-sympathy/QbmpZxp6C64)
and might to upgrade your kernel(s).





2015-07-23 17:51 GMT+02:00 Loïc Chanel <lo...@telecomnancy.net>:

> Hi,
>
> I am using a high-availability cluster with 3 JournalNodes and 2 NameNodes
> on 2 out of 3 of these hosts, and the NameNode switched his host 3 times in
> less than 24 hours without apparent reason.
>
> This can't be a network problem, as the logs indicate clearly that the
> NameNode can't send logs to the JournalNode running on the exact same host,
> while calling it using its IP, and this doesn't seem to be a CPU or RAM
> problem as the command sar does not return any abnormality, and Ganglia
> graphics show that the JVM has way more memory than it needs to have.
>
> Do any of you have an idea about where the problem might come from ?
> Thanks in advance,
>
>
> Loïc
>
> Loïc CHANEL
> Engineering student at TELECOM Nancy
> Trainee at Worldline - Villeurbanne
>