You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-dev@hadoop.apache.org by Kihwal Lee <ki...@verizonmedia.com> on 2019/08/27 13:41:27 UTC

Re: [Help] NameNode error

That's not supposed to happen. What version of Hadoop are you using? It
Please file a jira with details including how the namenodes are configured.

For the recovery:
First and foremost, do not shut down the active namenode. Put it into safe
mode and issue a saveNamespace command to create a checkpoint.  Then use
the bootstrapStandby command to re-initialize the standby.

Hope it helps.

Kihwal


On Tue, Aug 27, 2019 at 7:12 AM Lionel CL <wh...@outlook.com> wrote:

> Hi committee,
> We encountered a NN error as below,
> The primary NN was shut down last Thursday and we recover it by remove
> some OP in the edit log..  But today the standby NN was shut down again by
> the same error...
> could you pls help address the possible root cause?
>
> 2019-08-27 09:51:14,075 ERROR
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered
> exception on operation CloseOp [length=0, inodeId=0,
> path=/******/v2-data-20190826.data, replication=2, mtime=1566870616821,
> atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421,
> blk_1270599852_758967928, blk_1270601282_759026903,
> blk_1270602443_759027052, blk_1270602446_759061086,
> blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--,
> aclEntries=null, clientName=, clientMachine=, overwrite=false,
> storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE,
> txid=4359520942]
> java.io.IOException: Mismatched block IDs or generation stamps, attempting
> to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as
> block # 4/6 of /******/v2-data-20190826.mayfly.data
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
> at
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
> at
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
> at
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
> 2019-08-27 09:51:14,077 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write
> lock held for 11714 ms via
>
>
> Thanks & Best Regards,
> Lionel Cao
>

Re: [Help] NameNode error

Posted by Lionel CL <wh...@outlook.com>.
sorry missed the version 3.0.0-cdh6.0.1

From: "whucaolu@outlook.com<ma...@outlook.com>" <wh...@outlook.com>>
Date: Wednesday, 28 August 2019 at 10:53 AM
To: Kihwal Lee <ki...@verizonmedia.com>>
Cc: "hdfs-dev@hadoop.apache.org<ma...@hadoop.apache.org>" <hd...@hadoop.apache.org>>
Subject: Re: [Help] NameNode error

Hi Kihwal,
Thank you for the quick response. I have created a Jira ticket https://issues.apache.org/jira/browse/HDFS-14787
In this ticket I attached some log, configuration file and code. We did some append or concat operation on hdfs file and I'm not sure whether the NN shutdown was caused by those action.
Could you give some advices?

Thanks & Best Regards,
Lionel Cao

From: Kihwal Lee <ki...@verizonmedia.com>>
Date: Tuesday, 27 August 2019 at 9:41 PM
To: "whucaolu@outlook.com<ma...@outlook.com>" <wh...@outlook.com>>
Cc: "hdfs-dev@hadoop.apache.org<ma...@hadoop.apache.org>" <hd...@hadoop.apache.org>>
Subject: Re: [Help] NameNode error

That's not supposed to happen. What version of Hadoop are you using? It Please file a jira with details including how the namenodes are configured.

For the recovery:
First and foremost, do not shut down the active namenode. Put it into safe mode and issue a saveNamespace command to create a checkpoint.  Then use the bootstrapStandby command to re-initialize the standby.

Hope it helps.

Kihwal


On Tue, Aug 27, 2019 at 7:12 AM Lionel CL <wh...@outlook.com>> wrote:
Hi committee,
We encountered a NN error as below,
The primary NN was shut down last Thursday and we recover it by remove some OP in the edit log..  But today the standby NN was shut down again by the same error...
could you pls help address the possible root cause?

2019-08-27 09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/******/v2-data-20190826.data, replication=2, mtime=1566870616821, atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421, blk_1270599852_758967928, blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086, blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=4359520942]
java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /******/v2-data-20190826.mayfly.data
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
2019-08-27 09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 11714 ms via


Thanks & Best Regards,
Lionel Cao

Re: [Help] NameNode error

Posted by Lionel CL <wh...@outlook.com>.
Hi Kihwal,
Thank you for the quick response. I have created a Jira ticket https://issues.apache.org/jira/browse/HDFS-14787
In this ticket I attached some log, configuration file and code. We did some append or concat operation on hdfs file and I'm not sure whether the NN shutdown was caused by those action.
Could you give some advices?

Thanks & Best Regards,
Lionel Cao

From: Kihwal Lee <ki...@verizonmedia.com>>
Date: Tuesday, 27 August 2019 at 9:41 PM
To: "whucaolu@outlook.com<ma...@outlook.com>" <wh...@outlook.com>>
Cc: "hdfs-dev@hadoop.apache.org<ma...@hadoop.apache.org>" <hd...@hadoop.apache.org>>
Subject: Re: [Help] NameNode error

That's not supposed to happen. What version of Hadoop are you using? It Please file a jira with details including how the namenodes are configured.

For the recovery:
First and foremost, do not shut down the active namenode. Put it into safe mode and issue a saveNamespace command to create a checkpoint.  Then use the bootstrapStandby command to re-initialize the standby.

Hope it helps.

Kihwal


On Tue, Aug 27, 2019 at 7:12 AM Lionel CL <wh...@outlook.com>> wrote:
Hi committee,
We encountered a NN error as below,
The primary NN was shut down last Thursday and we recover it by remove some OP in the edit log..  But today the standby NN was shut down again by the same error...
could you pls help address the possible root cause?

2019-08-27 09:51:14,075 ERROR org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader: Encountered exception on operation CloseOp [length=0, inodeId=0, path=/******/v2-data-20190826.data, replication=2, mtime=1566870616821, atime=1566870359230, blockSize=134217728, blocks=[blk_1270599798_758966421, blk_1270599852_758967928, blk_1270601282_759026903, blk_1270602443_759027052, blk_1270602446_759061086, blk_1270603081_759050235], permissions=smc_ss:smc_ss:rw-r--r--, aclEntries=null, clientName=, clientMachine=, overwrite=false, storagePolicyId=0, erasureCodingPolicyId=0, opCode=OP_CLOSE, txid=4359520942]
java.io.IOException: Mismatched block IDs or generation stamps, attempting to replace block blk_1270602446_759027503 with blk_1270602446_759061086 as block # 4/6 of /******/v2-data-20190826.mayfly.data
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.updateBlocks(FSEditLogLoader.java:1096)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:452)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:249)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:158)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:888)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:869)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:293)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:427)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:380)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:397)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:482)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:393)
2019-08-27 09:51:14,077 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: FSNamesystem write lock held for 11714 ms via


Thanks & Best Regards,
Lionel Cao