You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by cho ju il <tj...@kgrid.co.kr> on 2014/08/01 08:06:46 UTC

Why does suddenly ha switching?

 
 
Why does suddenly ha switching? 
My hadoop cluster HA active namenode(host1) suddenly switch to standby namenode(host2). 
I could not found any error in hadoop logs (in any server) to identify the root cause. 
 
Tthe Namenodes following error appeared in hdfs logs frequently and non of the application could read the HDFS files.
 
 
*** namenode log
2014-08-01 04:20:39,133 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-08-01 04:20:39,151 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).
2014-08-01 04:21:03,608 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state
2014-08-01 04:21:03,728 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 


*** zkfc log
2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46910ms for sessionid 0x147000ee1f70137,  closing socket connection and attempting reconnect

2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...


Re: Why does suddenly ha switching?

Posted by cho ju il <tj...@kgrid.co.kr>.
Look at the log time. 
The namenode  is already switching. 
ZKFC log Is written after the namenode is swiched. 
 
Timeline... 
1.  Something?  What happened? The log does not record.
2.  2014-08-01 04:21:03,608   HA switching 
3.  2014-08-01 04:21:03,601   Zookeeper session timeout 
4.  2014-08-01 04:21:03,728   Namenode shutdown 
5.  2014-08-01 04:21:03,703   Zookeeper session disconnected  
 
 

-----Original Message-----
From: "Gordon Wang"&lt;gwang@pivotal.io&gt; 
To: &lt;user@hadoop.apache.org&gt;; "cho ju il"&lt;tjstory@kgrid.co.kr&gt;; 
Cc: 
Sent: 2014-08-01 (금) 15:44:59
Subject: Re: Why does suddenly ha switching?
 
From the log, looks like the connections between NameNodes and ZK quorum are not stable. And the ZK session is time-out. You can check the log of the Zookeeper servers. You may find some errors about the connection failure.



On Fri, Aug 1, 2014 at 2:06 PM, cho ju il &lt;tjstory@kgrid.co.kr&gt; wrote:




Why does suddenly ha switching? 
My hadoop cluster HA active namenode(host1) suddenly switch to standby namenode(host2). 


I could not found any error in hadoop logs (in any server) to identify the root cause. 


Tthe Namenodes following error appeared in hdfs logs frequently and non of the application could read the HDFS files.


*** namenode log


2014-08-01 04:20:39,133 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds


2014-08-01 04:20:39,151 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).


2014-08-01 04:21:03,608 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state


2014-08-01 04:21:03,728 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 




*** zkfc log


2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46910ms for sessionid 0x147000ee1f70137,  closing socket connection and attempting reconnect


2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...


 

 -- 
RegardsGordon Wang




Re: Why does suddenly ha switching?

Posted by cho ju il <tj...@kgrid.co.kr>.
Look at the log time. 
The namenode  is already switching. 
ZKFC log Is written after the namenode is swiched. 
 
Timeline... 
1.  Something?  What happened? The log does not record.
2.  2014-08-01 04:21:03,608   HA switching 
3.  2014-08-01 04:21:03,601   Zookeeper session timeout 
4.  2014-08-01 04:21:03,728   Namenode shutdown 
5.  2014-08-01 04:21:03,703   Zookeeper session disconnected  
 
 

-----Original Message-----
From: "Gordon Wang"&lt;gwang@pivotal.io&gt; 
To: &lt;user@hadoop.apache.org&gt;; "cho ju il"&lt;tjstory@kgrid.co.kr&gt;; 
Cc: 
Sent: 2014-08-01 (금) 15:44:59
Subject: Re: Why does suddenly ha switching?
 
From the log, looks like the connections between NameNodes and ZK quorum are not stable. And the ZK session is time-out. You can check the log of the Zookeeper servers. You may find some errors about the connection failure.



On Fri, Aug 1, 2014 at 2:06 PM, cho ju il &lt;tjstory@kgrid.co.kr&gt; wrote:




Why does suddenly ha switching? 
My hadoop cluster HA active namenode(host1) suddenly switch to standby namenode(host2). 


I could not found any error in hadoop logs (in any server) to identify the root cause. 


Tthe Namenodes following error appeared in hdfs logs frequently and non of the application could read the HDFS files.


*** namenode log


2014-08-01 04:20:39,133 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds


2014-08-01 04:20:39,151 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).


2014-08-01 04:21:03,608 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state


2014-08-01 04:21:03,728 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 




*** zkfc log


2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46910ms for sessionid 0x147000ee1f70137,  closing socket connection and attempting reconnect


2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...


 

 -- 
RegardsGordon Wang




Re: Why does suddenly ha switching?

Posted by cho ju il <tj...@kgrid.co.kr>.
Look at the log time. 
The namenode  is already switching. 
ZKFC log Is written after the namenode is swiched. 
 
Timeline... 
1.  Something?  What happened? The log does not record.
2.  2014-08-01 04:21:03,608   HA switching 
3.  2014-08-01 04:21:03,601   Zookeeper session timeout 
4.  2014-08-01 04:21:03,728   Namenode shutdown 
5.  2014-08-01 04:21:03,703   Zookeeper session disconnected  
 
 

-----Original Message-----
From: "Gordon Wang"&lt;gwang@pivotal.io&gt; 
To: &lt;user@hadoop.apache.org&gt;; "cho ju il"&lt;tjstory@kgrid.co.kr&gt;; 
Cc: 
Sent: 2014-08-01 (금) 15:44:59
Subject: Re: Why does suddenly ha switching?
 
From the log, looks like the connections between NameNodes and ZK quorum are not stable. And the ZK session is time-out. You can check the log of the Zookeeper servers. You may find some errors about the connection failure.



On Fri, Aug 1, 2014 at 2:06 PM, cho ju il &lt;tjstory@kgrid.co.kr&gt; wrote:




Why does suddenly ha switching? 
My hadoop cluster HA active namenode(host1) suddenly switch to standby namenode(host2). 


I could not found any error in hadoop logs (in any server) to identify the root cause. 


Tthe Namenodes following error appeared in hdfs logs frequently and non of the application could read the HDFS files.


*** namenode log


2014-08-01 04:20:39,133 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds


2014-08-01 04:20:39,151 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).


2014-08-01 04:21:03,608 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state


2014-08-01 04:21:03,728 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 




*** zkfc log


2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46910ms for sessionid 0x147000ee1f70137,  closing socket connection and attempting reconnect


2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...


 

 -- 
RegardsGordon Wang




Re: Why does suddenly ha switching?

Posted by cho ju il <tj...@kgrid.co.kr>.
Look at the log time. 
The namenode  is already switching. 
ZKFC log Is written after the namenode is swiched. 
 
Timeline... 
1.  Something?  What happened? The log does not record.
2.  2014-08-01 04:21:03,608   HA switching 
3.  2014-08-01 04:21:03,601   Zookeeper session timeout 
4.  2014-08-01 04:21:03,728   Namenode shutdown 
5.  2014-08-01 04:21:03,703   Zookeeper session disconnected  
 
 

-----Original Message-----
From: "Gordon Wang"&lt;gwang@pivotal.io&gt; 
To: &lt;user@hadoop.apache.org&gt;; "cho ju il"&lt;tjstory@kgrid.co.kr&gt;; 
Cc: 
Sent: 2014-08-01 (금) 15:44:59
Subject: Re: Why does suddenly ha switching?
 
From the log, looks like the connections between NameNodes and ZK quorum are not stable. And the ZK session is time-out. You can check the log of the Zookeeper servers. You may find some errors about the connection failure.



On Fri, Aug 1, 2014 at 2:06 PM, cho ju il &lt;tjstory@kgrid.co.kr&gt; wrote:




Why does suddenly ha switching? 
My hadoop cluster HA active namenode(host1) suddenly switch to standby namenode(host2). 


I could not found any error in hadoop logs (in any server) to identify the root cause. 


Tthe Namenodes following error appeared in hdfs logs frequently and non of the application could read the HDFS files.


*** namenode log


2014-08-01 04:20:39,133 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds


2014-08-01 04:20:39,151 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).


2014-08-01 04:21:03,608 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services started for active state


2014-08-01 04:21:03,728 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG: 




*** zkfc log


2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client session timed out, have not heard from server in 46910ms for sessionid 0x147000ee1f70137,  closing socket connection and attempting reconnect


2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...


 

 -- 
RegardsGordon Wang




Re: Why does suddenly ha switching?

Posted by Gordon Wang <gw...@pivotal.io>.
>From the log, looks like the connections between NameNodes and ZK quorum
are not stable. And the ZK session is time-out. You can check the log of
the Zookeeper servers. You may find some errors about the connection
failure.


On Fri, Aug 1, 2014 at 2:06 PM, cho ju il <tj...@kgrid.co.kr> wrote:

>
>
>
>
> Why does suddenly ha switching?
>
> My hadoop cluster HA active namenode(host1) suddenly switch to standby
> namenode(host2).
>
> I could not found any error in hadoop logs (in any server) to identify the
> root cause.
>
>
>
> Tthe Namenodes following error appeared in hdfs logs frequently and non of
> the application could read the HDFS files.
>
>
>
>
>
> *** namenode log
>
> 2014-08-01 04:20:39,133 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Rescanning after 30000 milliseconds
>
> 2014-08-01 04:20:39,151 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).
>
> 2014-08-01 04:21:03,608 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services
> started for active state
>
> 2014-08-01 04:21:03,728 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
> *** zkfc log
>
> 2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 46910ms for sessionid
> 0x147000ee1f70137,  closing socket connection and attempting reconnect
>
> 2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session disconnected. Entering neutral mode...
>
>


-- 
Regards
Gordon Wang

Re: Why does suddenly ha switching?

Posted by Gordon Wang <gw...@pivotal.io>.
>From the log, looks like the connections between NameNodes and ZK quorum
are not stable. And the ZK session is time-out. You can check the log of
the Zookeeper servers. You may find some errors about the connection
failure.


On Fri, Aug 1, 2014 at 2:06 PM, cho ju il <tj...@kgrid.co.kr> wrote:

>
>
>
>
> Why does suddenly ha switching?
>
> My hadoop cluster HA active namenode(host1) suddenly switch to standby
> namenode(host2).
>
> I could not found any error in hadoop logs (in any server) to identify the
> root cause.
>
>
>
> Tthe Namenodes following error appeared in hdfs logs frequently and non of
> the application could read the HDFS files.
>
>
>
>
>
> *** namenode log
>
> 2014-08-01 04:20:39,133 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Rescanning after 30000 milliseconds
>
> 2014-08-01 04:20:39,151 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).
>
> 2014-08-01 04:21:03,608 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services
> started for active state
>
> 2014-08-01 04:21:03,728 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
> *** zkfc log
>
> 2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 46910ms for sessionid
> 0x147000ee1f70137,  closing socket connection and attempting reconnect
>
> 2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session disconnected. Entering neutral mode...
>
>


-- 
Regards
Gordon Wang

Re: Why does suddenly ha switching?

Posted by Gordon Wang <gw...@pivotal.io>.
>From the log, looks like the connections between NameNodes and ZK quorum
are not stable. And the ZK session is time-out. You can check the log of
the Zookeeper servers. You may find some errors about the connection
failure.


On Fri, Aug 1, 2014 at 2:06 PM, cho ju il <tj...@kgrid.co.kr> wrote:

>
>
>
>
> Why does suddenly ha switching?
>
> My hadoop cluster HA active namenode(host1) suddenly switch to standby
> namenode(host2).
>
> I could not found any error in hadoop logs (in any server) to identify the
> root cause.
>
>
>
> Tthe Namenodes following error appeared in hdfs logs frequently and non of
> the application could read the HDFS files.
>
>
>
>
>
> *** namenode log
>
> 2014-08-01 04:20:39,133 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Rescanning after 30000 milliseconds
>
> 2014-08-01 04:20:39,151 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).
>
> 2014-08-01 04:21:03,608 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services
> started for active state
>
> 2014-08-01 04:21:03,728 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
> *** zkfc log
>
> 2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 46910ms for sessionid
> 0x147000ee1f70137,  closing socket connection and attempting reconnect
>
> 2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session disconnected. Entering neutral mode...
>
>


-- 
Regards
Gordon Wang

Re: Why does suddenly ha switching?

Posted by Gordon Wang <gw...@pivotal.io>.
>From the log, looks like the connections between NameNodes and ZK quorum
are not stable. And the ZK session is time-out. You can check the log of
the Zookeeper servers. You may find some errors about the connection
failure.


On Fri, Aug 1, 2014 at 2:06 PM, cho ju il <tj...@kgrid.co.kr> wrote:

>
>
>
>
> Why does suddenly ha switching?
>
> My hadoop cluster HA active namenode(host1) suddenly switch to standby
> namenode(host2).
>
> I could not found any error in hadoop logs (in any server) to identify the
> root cause.
>
>
>
> Tthe Namenodes following error appeared in hdfs logs frequently and non of
> the application could read the HDFS files.
>
>
>
>
>
> *** namenode log
>
> 2014-08-01 04:20:39,133 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Rescanning after 30000 milliseconds
>
> 2014-08-01 04:20:39,151 INFO
> org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor:
> Scanned 0 directive(s) and 0 block(s) in 19 millisecond(s).
>
> 2014-08-01 04:21:03,608 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Stopping services
> started for active state
>
> 2014-08-01 04:21:03,728 INFO
> org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
>
>
> *** zkfc log
>
> 2014-08-01 04:21:03,601 INFO org.apache.zookeeper.ClientCnxn: Client
> session timed out, have not heard from server in 46910ms for sessionid
> 0x147000ee1f70137,  closing socket connection and attempting reconnect
>
> 2014-08-01 04:21:03,703 INFO org.apache.hadoop.ha.ActiveStandbyElector:
> Session disconnected. Entering neutral mode...
>
>


-- 
Regards
Gordon Wang