You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hadoop.apache.org by lei liu <li...@gmail.com> on 2013/08/01 10:43:03 UTC

Standby NameNode checkpoint exception

I use hadoop-2.0.5, and QJM for HA.

When Standby NameNode do checkpoint,there are below exception  in Standby
NameNode:
2013-08-01 13:43:07,965 INFO
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer: Triggering
checkpoint because there have been 763426 txns since the last checkpoint, wh
ich exceeds the configured threshold 40000
2013-08-01 13:43:07,966 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Saving image file
/home/musa.ll/hadoop2/cluster-data/name/current/fsimage.ckpt_0000000000048708235
usi
ng no compression
2013-08-01 13:43:37,405 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
1504089705 saved in 29 seconds.
2013-08-01 13:43:37,410 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
retain 2 images with txid >= 47944809
2013-08-01 13:43:37,410 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Purging
old image FSImageFile(file=/home/musa.ll/hadoop2/cluster-data/name/current/f
simage_0000000000047222679, cpktTxId=0000000000047222679)
2013-08-01 13:43:37,723 WARN
org.apache.hadoop.hdfs.server.namenode.FSEditLog: Unable to determine input
streams from QJM to [10.232.98.61:20022, 10.232.98.62:20022, 10.232.98.63:
20022, 10.232.98.64:20022, 10.232.98.65:20022]. Skipping.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 3/5. 4 exceptions thrown:
10.232.98.62:20022: Asked for firstTxId 46944810 which is in the middle of
file
/home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000046630461-0000000000047222679
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
"hadoop-musa.ll-namenode-dw78.kgb.sqa.cm4.log" 350842L,
60353971C
348726,1      99%
2013-08-01 14:28:07,051 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
26.08s at 0.00 KB/s
2013-08-01 14:28:07,051 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 60835762 to namenode at 10.232.98.77:20021
2013-08-01 14:29:05,203 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log
roll on remote NameNode /10.232.98.77:20020
2013-08-01 14:29:06,242 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
137678/567332 transactions completed. (24%)
2013-08-01 14:29:07,243 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
275618/567332 transactions completed. (49%)
2013-08-01 14:29:08,244 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
407627/567332 transactions completed. (72%)
2013-08-01 14:29:09,245 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
545153/567332 transactions completed. (96%)
2013-08-01 14:29:20,146 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 567332
edits starting from txid 60835762
2013-08-01 14:30:44,411 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
1950604672 saved in 37 seconds.
2013-08-01 14:30:44,416 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
retain 2 images with txid >= 60835762
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 3/5. 4 exceptions thrown:
10.232.98.62:20022: Asked for firstTxId 59835763 which is in the middle of
file
/home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

10.232.98.63:20022: Asked for firstTxId 59835763 which is in the middle of
file
/home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:226)
2013-08-01 14:28:07,051 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
26.08s at 0.00 KB/s
2013-08-01 14:28:07,051 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 60835762 to namenode at 10.232.98.77:20021
2013-08-01 14:29:05,203 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Triggering log
roll on remote NameNode /10.232.98.77:20020
2013-08-01 14:29:06,242 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
137678/567332 transactions completed. (24%)
2013-08-01 14:29:07,243 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
275618/567332 transactions completed. (49%)
2013-08-01 14:29:08,244 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
407627/567332 transactions completed. (72%)
2013-08-01 14:29:09,245 INFO
org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: replaying edit log:
545153/567332 transactions completed. (96%)
2013-08-01 14:29:20,146 INFO
org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer: Loaded 567332
edits starting from txid 60835762
2013-08-01 14:30:44,411 INFO
org.apache.hadoop.hdfs.server.namenode.FSImage: Image file of size
1950604672 saved in 37 seconds.
2013-08-01 14:30:44,416 INFO
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager: Going to
retain 2 images with txid >= 60835762
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many
exceptions to achieve quorum size 3/5. 4 exceptions thrown:
10.232.98.62:20022: Asked for firstTxId 59835763 which is in the middle of
file
/home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

10.232.98.63:20022: Asked for firstTxId 59835763 which is in the middle of
file
/home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at java.security.AccessController.doPrivileged(Native Method)
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

10.232.98.64:20022: Asked for firstTxId 59835763 which is in the middle of
file
/home/musa.ll/hadoop2/journal/mycluster/current/edits_0000000000059678382-0000000000060264590
        at
org.apache.hadoop.hdfs.server.namenode.FileJournalManager.getRemoteEditLogs(FileJournalManager.java:183)
        at
org.apache.hadoop.hdfs.qjournal.server.Journal.getEditLogManifest(Journal.java:628)
        at
org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.getEditLogManifest(JournalNodeRpcServer.java:180)
        at
org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.getEditLogManifest(QJournalProtocolServerSideTranslatorPB.java:203)
        at
org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:14028)
        at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1014)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1741)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1737)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1735)

        at
org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
        at
org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:213)
        at
org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
        at
org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager.selectInputStreams(QuorumJournalManager.java:455)
        at
org.apache.hadoop.hdfs.server.namenode.JournalSet.selectInputStreams(JournalSet.java:249)
        at
org.apache.hadoop.hdfs.server.namenode.FSEditLog.selectInputStreams(FSEditLog.java:1130)
        at
org.apache.hadoop.hdfs.server.namenode.NNStorageRetentionManager.purgeOldStorage(NNStorageRetentionManager.java:111)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.purgeOldStorage(FSImage.java:946)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImageInAllDirs(FSImage.java:931)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveNamespace(FSImage.java:868)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:165)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1100(StandbyCheckpointer.java:53)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.doWork(StandbyCheckpointer.java:297)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.access$300(StandbyCheckpointer.java:210)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread$1.run(StandbyCheckpointer.java:230)
        at
org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
        at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer$CheckpointerThread.run(StandbyCheckpointer.java:226)
2013-08-01 14:30:44,799 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Opening connection
to http://10.232.98.77:20021/getimage?putimage=1&txid=61403094&port=20021&s
torageInfo=-40:1499625118:0:CID-921af0aa-b831-4828-965c-3b71a5149600
2013-08-01 14:31:15,974 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Transfer took
31.18s at 0.00 KB/s
2013-08-01 14:31:15,974 INFO
org.apache.hadoop.hdfs.server.namenode.TransferFsImage: Uploaded image with
txid 61403094 to namenode at 10.232.98.77:20021


How  can I handle the exception?

Thanks,

LiuLei