You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by mail list <lo...@gmail.com> on 2014/12/03 11:15:27 UTC

QJM cost 5 minutes for failover

Hi,all 

we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins to work normal.

Log on the standby name node as below:

{log}
014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 6797
2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 6797
2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 9
2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning because of pending operations
2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 179
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 4
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483. Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[]}
2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483. 
{log}

As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor, I have no idea
what hadoop is doing in the five minutes? And how can i reduce the time?






Re: QJM cost 5 minutes for failover

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi,

Are you using ZKFC? If the answer is positive, could you share the
configuration files for ZooKeeper?

Thanks,
- Tsuyoshi

On Wed, Dec 3, 2014 at 2:15 AM, mail list <lo...@gmail.com> wrote:
> Hi,all
>
> we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins to work normal.
>
> Log on the standby name node as below:
>
> {log}
> 014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 6797
> 2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 6797
> 2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 9
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning because of pending operations
> 2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
> 2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 179
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 4
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
> 2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483. Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[]}
> 2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
> {log}
>
> As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor, I have no idea
> what hadoop is doing in the five minutes? And how can i reduce the time?
>
>
>
>
>



-- 
- Tsuyoshi

Re: QJM cost 5 minutes for failover

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi,

Are you using ZKFC? If the answer is positive, could you share the
configuration files for ZooKeeper?

Thanks,
- Tsuyoshi

On Wed, Dec 3, 2014 at 2:15 AM, mail list <lo...@gmail.com> wrote:
> Hi,all
>
> we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins to work normal.
>
> Log on the standby name node as below:
>
> {log}
> 014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 6797
> 2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 6797
> 2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 9
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning because of pending operations
> 2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
> 2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 179
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 4
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
> 2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483. Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[]}
> 2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
> {log}
>
> As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor, I have no idea
> what hadoop is doing in the five minutes? And how can i reduce the time?
>
>
>
>
>



-- 
- Tsuyoshi

Re: QJM cost 5 minutes for failover

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi,

Are you using ZKFC? If the answer is positive, could you share the
configuration files for ZooKeeper?

Thanks,
- Tsuyoshi

On Wed, Dec 3, 2014 at 2:15 AM, mail list <lo...@gmail.com> wrote:
> Hi,all
>
> we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins to work normal.
>
> Log on the standby name node as below:
>
> {log}
> 014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 6797
> 2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 6797
> 2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 9
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning because of pending operations
> 2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
> 2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 179
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 4
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
> 2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483. Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[]}
> 2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
> {log}
>
> As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor, I have no idea
> what hadoop is doing in the five minutes? And how can i reduce the time?
>
>
>
>
>



-- 
- Tsuyoshi

Re: QJM cost 5 minutes for failover

Posted by Tsuyoshi OZAWA <oz...@gmail.com>.
Hi,

Are you using ZKFC? If the answer is positive, could you share the
configuration files for ZooKeeper?

Thanks,
- Tsuyoshi

On Wed, Dec 3, 2014 at 2:15 AM, mail list <lo...@gmail.com> wrote:
> Hi,all
>
> we are testing QJM NAMENODE HA, and when the active name node down,it cost about 5 mins to work normal.
>
> Log on the standby name node as below:
>
> {log}
> 014-12-03 15:55:51,307 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing edit logs at txnid 6797
> 2014-12-03 15:55:51,313 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 6797
> 2014-12-03 15:55:51,373 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 1 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0 9
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
> 2014-12-03 15:55:51,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning because of pending operations
> 2014-12-03 15:55:51,678 INFO org.apache.hadoop.fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
> 2014-12-03 15:55:51,679 INFO org.apache.hadoop.fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Total number of blocks            = 179
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of invalid blocks          = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of under-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of  over-replicated blocks = 0
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Number of blocks being written    = 4
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 386 msec
> 2014-12-03 15:55:51,693 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 308 millisecond(s).
> 2014-12-03 15:56:21,385 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:56:21,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:56:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:57:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
> 2014-12-03 15:58:51,386 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:58:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
> 2014-12-03 15:59:21,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 15:59:51,387 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
> 2014-12-03 15:59:51,388 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
> 2014-12-03 16:00:14,295 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483. Returning previously allocated block blk_1073743458_2634{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[]}
> 2014-12-03 16:00:14,299 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* allocateBlock: caught retry for allocation of a new block in /hbase/testnn/WALs/l-hbase3.dba.dev.cn0.qunar.com,60020,1417585992012/l-hbase3.dba.dev.cn0.qunar.com%2C60020%2C1417585992012.1417593301483.
> {log}
>
> As you seen, the time from 15:55:51 to 15:59:51 is all org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor, I have no idea
> what hadoop is doing in the five minutes? And how can i reduce the time?
>
>
>
>
>



-- 
- Tsuyoshi