You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Wei Zeng <wz...@shendusou.com> on 2010/07/23 08:46:09 UTC

FSNamesystem initialization failed saveLeases found path /.../0/wiley_intsci_0_0_0_0_0_3_2_0.yaml but no matching entry in namespace.

Hello

    It seems I should send this mail here, got no response from common-user
mail list yesterday.

     I have a hadoop(0.20.2) cluster running with 12 nodes . Today, when I
checking the log of secondaryname node , I found some exceptions like:

*2010-07-22 00:14:28,042 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
0 Total time for transactions(ms): 0Number of trans
actions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
2010-07-22 00:14:28,051 INFO org.apache.hadoop.hdfs.server.common.Storage:
Recovering storage directory /data1/hadoop-0.20.0-cozy/dfs/namesecondary
from fail
ed checkpoint.
2010-07-22 00:14:28,784 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
fsimage size 34054802 bytes.
2010-07-22 00:14:28,803 INFO
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
edits size 1256592 bytes.
2010-07-22 00:14:28,878 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
fsOwner=cozy,cozy,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
2010-07-22 00:14:28,878 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
2010-07-22 00:14:28,878 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
isPermissionEnabled=true
2010-07-22 00:14:28,879 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files = 258478
2010-07-22 00:14:35,019 INFO org.apache.hadoop.hdfs.server.common.Storage:
Number of files under construction = 7
2010-07-22 00:14:35,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
Edits file /data1/hadoop-0.20.0-cozy/dfs/namesecondary/current/edits of size
12565
92 edits # 12121 loaded in 0 seconds.
2010-07-22 00:14:35,895 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
doCheckpoint:
2010-07-22 00:14:35,896 ERROR
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
java.io.IOException: saveLeases found path /mantis_file/antis_2010071
3/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml but no matching
entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:592)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
        at
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
        at java.lang.Thread.run(Thread.java:619)*

Then , I closed the whole cluster and restarted namenode. Hoped it can
remove this exception. however, the name node failed when initializing, the
log is:

*10/07/23 00:49:40 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = dev1/192.168.18.3
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
************************************************************/
10/07/23 00:49:40 INFO metrics.RpcMetrics: Initializing RPC Metrics with
hostName=NameNode, port=9000
10/07/23 00:49:40 INFO namenode.NameNode: Namenode up at:
dev1.shendusou.com/192.168.18.3:9000
10/07/23 00:49:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=NameNode, sessionId=null
10/07/23 00:49:40 INFO metrics.NameNodeMetrics: Initializing
NameNodeMeterics using context
object:org.apache.hadoop.metrics.spi.NullContext
10/07/23 00:49:40 INFO namenode.FSNamesystem:
fsOwner=cozy,cozy,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
10/07/23 00:49:40 INFO namenode.FSNamesystem: supergroup=supergroup
10/07/23 00:49:40 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/07/23 00:49:40 INFO metrics.FSNamesystemMetrics: Initializing
FSNamesystemMetrics using context
object:org.apache.hadoop.metrics.spi.NullContext
10/07/23 00:49:40 INFO namenode.FSNamesystem: Registered
FSNamesystemStatusMBean
10/07/23 00:49:40 INFO common.Storage: Number of files = 258478
10/07/23 00:49:46 INFO common.Storage: Number of files under construction =
7
10/07/23 00:49:46 INFO common.Storage: Image file of size 34054802 loaded in
5 seconds.
10/07/23 00:49:47 INFO common.Storage: Edits file
/data1/hadoop-0.20.0-cozy/dfs/name/current/edits of size 1256592 edits #
12121 loaded in 0 seconds.
10/07/23 00:50:04 INFO namenode.FSNamesystem: Invalid opcode, reached end of
edit log Number of transactions found 958197
10/07/23 00:50:04 INFO common.Storage: Edits file
/data1/hadoop-0.20.0-cozy/dfs/name/current/edits.new of size 104297870 edits
# 958197 loaded in 16 seconds.
10/07/23 00:50:05 ERROR namenode.FSNamesystem: FSNamesystem initialization
failed.
java.io.IOException: saveLeases found path
/mantis_file/antis_20100713/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml
but no matching entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
10/07/23 00:50:05 INFO ipc.Server: Stopping server on 9000
10/07/23 00:50:05 ERROR namenode.NameNode: java.io.IOException: saveLeases
found path
/mantis_file/antis_20100713/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml
but no matching entry in namespace.
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
        at
org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
        at
org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
        at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
        at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)

10/07/23 00:50:05 INFO namenode.NameNode: SHUTDOWN_MSG: *


I found some JiRAs  about it  (HADOOP-3418, HADOOP-3724) ,however, These
issues are reported in 0.18 and 0.19  , seems all are fixed before 0.20.0,
does it come back in 0.20.2?

I tried to recover it from the checkpoint in secondary name node,  the
problem is , checkpoint is empty.

Any one can tell me how to recover the name node with current  fsimage,
edits and edits.new(or others on namenode) ?  lose some data is ok , lose
all is terrible

Is there a better way to recover the cluster besides formatting namenode?

Any idea is greatly appreciated!

-- 
wzeng

Re: FSNamesystem initialization failed saveLeases found path /.../0/wiley_intsci_0_0_0_0_0_3_2_0.yaml but no matching entry in namespace.

Posted by Todd Lipcon <to...@cloudera.com>.
Hi Wei,

This is an interesting bug - I haven't seen it before in the 0.20 series.

Is your namespace data confidential? If not, would it be possible to upload
a bzip2 of your NN storage directory somewhere publically accessible? Note
that it would not include any file contents, just filenames.

Thanks
-Todd

On Thu, Jul 22, 2010 at 11:46 PM, Wei Zeng <wz...@shendusou.com> wrote:

>
> Hello
>
>     It seems I should send this mail here, got no response from common-user
> mail list yesterday.
>
>      I have a hadoop(0.20.2) cluster running with 12 nodes . Today, when I
> checking the log of secondaryname node , I found some exceptions like:
>
> *2010-07-22 00:14:28,042 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Number of transactions:
> 0 Total time for transactions(ms): 0Number of trans
> actions batched in Syncs: 0 Number of syncs: 0 SyncTimes(ms): 0
> 2010-07-22 00:14:28,051 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Recovering storage directory /data1/hadoop-0.20.0-cozy/dfs/namesecondary
> from fail
> ed checkpoint.
> 2010-07-22 00:14:28,784 INFO
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
> fsimage size 34054802 bytes.
> 2010-07-22 00:14:28,803 INFO
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Downloaded file
> edits size 1256592 bytes.
> 2010-07-22 00:14:28,878 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> fsOwner=cozy,cozy,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
> 2010-07-22 00:14:28,878 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: supergroup=supergroup
> 2010-07-22 00:14:28,878 INFO
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
> isPermissionEnabled=true
> 2010-07-22 00:14:28,879 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files = 258478
> 2010-07-22 00:14:35,019 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Number of files under construction = 7
> 2010-07-22 00:14:35,206 INFO org.apache.hadoop.hdfs.server.common.Storage:
> Edits file /data1/hadoop-0.20.0-cozy/dfs/namesecondary/current/edits of size
> 12565
> 92 edits # 12121 loaded in 0 seconds.
> 2010-07-22 00:14:35,895 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Exception in
> doCheckpoint:
> 2010-07-22 00:14:35,896 ERROR
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode:
> java.io.IOException: saveLeases found path /mantis_file/antis_2010071
> 3/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml but no matching
> entry in namespace.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.doMerge(SecondaryNameNode.java:592)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode$CheckpointStorage.access$000(SecondaryNameNode.java:473)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doMerge(SecondaryNameNode.java:350)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.doCheckpoint(SecondaryNameNode.java:314)
>         at
> org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.run(SecondaryNameNode.java:225)
>         at java.lang.Thread.run(Thread.java:619)*
>
> Then , I closed the whole cluster and restarted namenode. Hoped it can
> remove this exception. however, the name node failed when initializing, the
> log is:
>
> *10/07/23 00:49:40 INFO namenode.NameNode: STARTUP_MSG:
> /************************************************************
> STARTUP_MSG: Starting NameNode
> STARTUP_MSG:   host = dev1/192.168.18.3
> STARTUP_MSG:   args = []
> STARTUP_MSG:   version = 0.20.2
> STARTUP_MSG:   build =
> https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20 -r
> 911707; compiled by 'chrisdo' on Fri Feb 19 08:07:34 UTC 2010
> ************************************************************/
> 10/07/23 00:49:40 INFO metrics.RpcMetrics: Initializing RPC Metrics with
> hostName=NameNode, port=9000
> 10/07/23 00:49:40 INFO namenode.NameNode: Namenode up at:
> dev1.shendusou.com/192.168.18.3:9000
> 10/07/23 00:49:40 INFO jvm.JvmMetrics: Initializing JVM Metrics with
> processName=NameNode, sessionId=null
> 10/07/23 00:49:40 INFO metrics.NameNodeMetrics: Initializing
> NameNodeMeterics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 10/07/23 00:49:40 INFO namenode.FSNamesystem:
> fsOwner=cozy,cozy,adm,dialout,cdrom,plugdev,lpadmin,sambashare,admin
> 10/07/23 00:49:40 INFO namenode.FSNamesystem: supergroup=supergroup
> 10/07/23 00:49:40 INFO namenode.FSNamesystem: isPermissionEnabled=true
> 10/07/23 00:49:40 INFO metrics.FSNamesystemMetrics: Initializing
> FSNamesystemMetrics using context
> object:org.apache.hadoop.metrics.spi.NullContext
> 10/07/23 00:49:40 INFO namenode.FSNamesystem: Registered
> FSNamesystemStatusMBean
> 10/07/23 00:49:40 INFO common.Storage: Number of files = 258478
> 10/07/23 00:49:46 INFO common.Storage: Number of files under construction =
> 7
> 10/07/23 00:49:46 INFO common.Storage: Image file of size 34054802 loaded
> in 5 seconds.
> 10/07/23 00:49:47 INFO common.Storage: Edits file
> /data1/hadoop-0.20.0-cozy/dfs/name/current/edits of size 1256592 edits #
> 12121 loaded in 0 seconds.
> 10/07/23 00:50:04 INFO namenode.FSNamesystem: Invalid opcode, reached end
> of edit log Number of transactions found 958197
> 10/07/23 00:50:04 INFO common.Storage: Edits file
> /data1/hadoop-0.20.0-cozy/dfs/name/current/edits.new of size 104297870 edits
> # 958197 loaded in 16 seconds.
> 10/07/23 00:50:05 ERROR namenode.FSNamesystem: FSNamesystem initialization
> failed.
> java.io.IOException: saveLeases found path
> /mantis_file/antis_20100713/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml
> but no matching entry in namespace.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
> 10/07/23 00:50:05 INFO ipc.Server: Stopping server on 9000
> 10/07/23 00:50:05 ERROR namenode.NameNode: java.io.IOException: saveLeases
> found path
> /mantis_file/antis_20100713/web_page/wiley_intsci/0/wiley_intsci_0_0_0_0_0_3_2_0.yaml
> but no matching entry in namespace.
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.saveFilesUnderConstruction(FSNamesystem.java:4735)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1029)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSImage.saveFSImage(FSImage.java:1050)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSDirectory.loadFSImage(FSDirectory.java:88)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.initialize(FSNamesystem.java:311)
>         at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.<init>(FSNamesystem.java:292)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:201)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:279)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:956)
>         at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:965)
>
> 10/07/23 00:50:05 INFO namenode.NameNode: SHUTDOWN_MSG: *
>
>
> I found some JiRAs  about it  (HADOOP-3418, HADOOP-3724) ,however, These
> issues are reported in 0.18 and 0.19  , seems all are fixed before 0.20.0,
> does it come back in 0.20.2?
>
> I tried to recover it from the checkpoint in secondary name node,  the
> problem is , checkpoint is empty.
>
> Any one can tell me how to recover the name node with current  fsimage,
> edits and edits.new(or others on namenode) ?  lose some data is ok , lose
> all is terrible
>
> Is there a better way to recover the cluster besides formatting namenode?
>
> Any idea is greatly appreciated!
>
> --
> wzeng
>



-- 
Todd Lipcon
Software Engineer, Cloudera